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ABSTRACT 


Block  diagram  schemata  model  computation  systems  In  the  context  of  an 
external  environment.  The  environment  imposes  various  constraints  on  the  real-time 
performance  of  any  implementation  of  a block  diagram  schema.  The  model  is  used 
to  provide  precise  definitions  of  real-time  performance.  The  portion  of  the 
Implementation  that  affects  the  real-time  performance  is  called  the  control 
structure. 

This  research  investigates  several  strategies  for  synthesizing  control  structures 
to  satisfy  the  external  real-time  specifications.  The  simplest  strategy  is  to 
execute  all  the  blocks  in  the  diagram  in  some  fixed  order.  Control  structures  of 
this  type  have  been  somewhat  ignored  for  time  critical  applications.  The  synthesis 
problem  is  shown  to  be  solvable  in  the  sense  that  acyclic  control  structures  do  not 
need  to  be  considered.  A branch-and-bound  synthesis  algorithm  is  presented  which 
requires  exponential  time  in  the  worst  case.  Although  no  efficient  synthesis 
algorithm  was  found,  the  conjecture  that  the  problem  Is  NP-complete  is  not  proved. 

The  other  strategy  for  Implementing  control  structures  makes  use  of  the  fact 
that  in  some  applications  the  Input  values  change  at  discrete  times.  Under  this 
assumption,  block  diagram  schemata  are  similar  to  traditional  models  of  real-time 
computations.  An  efficient  algorithm  for  assigning  fixed  priorities  to  independent 
tasks  is  presented  that  guarantees  the  real-time  specifications  will  be  met.  This 
algorithm  relaxes  previous  restrictions  of  the  deadline  for  a task  being  coincident 
with  its  next  request. 

Finally,  some  of  the  issues  involved  with  multiple  processor  control  structures  are 
discussed,  although  no  specific  algorithms  are  investigated. 


Key  Words  and  Phrases:  real-time  scheduling,  f riority  scheduling,  deadline-driven 
scheduling,  control  structures 
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1 : Introduction 

Thera  are  many  applications  for  conputers  where  the  real-time  performance  of 
the  program  Is  critical,  these  applications  all  Involve  asynchronous  Interaction  with 
the  external  environment  and  It  Is  this  environment  that  Imposes  the  real-time 
restrictions.  For  example,  device  drivers  In  oporatlng  systems  must  respond  to 
Interrupts  before  the  information  Is  lost.  Another  application  is  In  direct  digital 
control  and  process  monitoring. 

However,  most  high-level  languages  are  not  designed  tor  producing  time  critical 
programs  The  languages  allow  the  user  to  define  appropriate  functional  and  data 
abstractions  tor  his  problem,  but  have  no  notion  of  real-time  or  asynchronous 
Interaction  with  the  real  world.  Instead,  the  usei  must  design  a control  structure 
for  his  problem  suitable  for  a single  sequential  process  that  will  satisfy  all  the 
real-time  constraints. 

1.1:  Previous  Work 

Many  operating  systems  do  have  notions  of  real-time  and  external  Input  and 
output,  but  they  are  supported  at  a fairly  low  level  [18,  20].  The  application 
program  typically  has  to  deal  with  priorities,  setting  real-time  alarms,  and  responding 
to  Interrupts.  These  actions  may  be  necessa'y  to  satisfy  the  constraints,  but  they 
to  not  bear  a close  relationship  to  the  constraints.  For  example.  It  Is  seldom 
obvious  what  priority  must  be  assigned  to  a task  that  must  complete  In  ten 
milliseconds  and  uses  one  millisecond  of  CPU  time. 

Early  work  on  applications  oriented  real-time  operating  systems  was  done  by 
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Field  [6],  Plata  proposed  a model  of  real-time  processes  characterized  by  three 
parameters  per  process. 

(1)  P i the  maximum  CPU  time  used  by  process  /. 

(2)  0.  the  maximum  delay  allowed  from  the  time  process  I requests  service 

to  the  completion  of  servicing  that  request. 

(3)  T i the  minimum  period  between  requests  for  process  /. 

Flala  proposes  three  scheduling  algorithms  for  this  model.  The  first  (and 


simplest)  executes  the  process  that  must  complete  the  soonest.  I.e.  the  process 
with  the  earliest  deadline.  This  algorithm  Is  optimal  In  the  sense  that  If  any 
schedule  satisfies  the  deadline  requirements  for  all  the  processes,  so  does  the 
earliest  deadline  schedule.  However,  this  result  Is  proved  In  the  context  of 
process  switching  requiring  negligible  overhead. 

Flala's  second  algorithm  Is  a modification  of  the  earliest  deadline  scheduler  that 
minimizes  the  number  of  process  switches  while  retaining  the  optimality  condition  of 
the  earliest  deadline  algorithm.  This  Is  accomplished  by  having  the  scheduler  check 
to  see  If  the  current  process  must  be  preempted  when  a process  with  an  earlier 
deadline  requests  service.  This  Is  done  by  simulating  the  action  of  the  scheduler 
on  the  current  requests.  Unfortunately,  this  algorithm  would  require  extensive 
computation  whenever  a process  requests  service.  Accordingly,  Flala's  third 
algorithm  pre-computes  a lower  bound  on  the  expression  required  by  the  minimum 
switching  algorithm.  With  the  lower  bound,  the  extra  computation  required  by  the 
third  algorithm  requires  an  extra  comparison  at  process  request  time.  The  algorithm 


Is  also  optimal  in  the  same  sense  and  requires  less  overhead  than  the  simpler 
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earliest  deadline  algorithm. 

However,  Flala  makes  no  attempt  to  Integrate  his  model  and  scheduler  into  a 
real-time  language  system.  One  such  approach  Is  control  robotics  developed  by 
Dertouzos  [3]  and  Geiger  [6j.  A control  robotics  program  is  organized  as  a set  of 
daemons  which  continuously  monitor  some  condition  and  execute  the  body  (a 
corrective  procedure)  when  the  condition  Is  true.  The  roahtime  specifications  for  a 
daemon  are  the  delay  from  when  a condition  becomes  true  to  when  the  program 
detects  the  condition  (the  recognition  time)  and  the  delay  from  detecting  a 
condition  and  executing  the  body  (the  response  time).  Geiger's  Implementation  of 
control  robotics  periodically  samples  the  condition  with  a period  slightly  less  than 
the  recognition  time  (the  slightly  higher  rate  will  allow  for  preemption  by  other 
daemon  conditions).  The  daemon  bodies  are  scheduled  using  an  earliest  deadline 
scheduler. 

One  weakness  of  control  robotics  Is  that  no  guarantee  of  satisfying  the  real-time 
constraints  Is  made  at  compile  time.  This  could  be  done  If  the  user  declared  a 
minimum  period  between  executions  of  a daemon  body  and  the  compiler  determined 
the  computation  time  of  the  daemon  bodies.  Since  it  Is  impossible  to  determine  the 
computation  time  for  an  arbitrary  procedure,  the  compiler  may  require  declarations 
to  determine  the  computation  time. 

A more  substantial  problem  of  Geiger's  Implementation  Is  the  assumption  that  the 
conditions  for  daemons  are  independent  of  the  execution  of  other  daemon  bodies. 
Therefore,  complex  structures  of  daemons  whose  conditions  depend  on  variables 
changed  by  other  daemons  could  result  in  much  unnecessary  computation.  All  in  all, 
control  robotics  does  not  provide  any  more  of  a model  for  real-time  programming 
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than  Flala’a  work  beyond  suggesting  some  syntax  for  identifying  tasks  and 
specifying  their  deadlines. 

Another  system  that  deals  with  real-time  specifications  at  the  user  level  is 
TOMAL  (Task  Oriented  Microprocessor  Language)  [12].  On  the  surface  TOMAL  is  a 
combination  of  a modern  block  structured  programming  language  and  a typical  mini- 
computer 'real-time’  operating  system.  However,  In  addition  to  assigning  static 
priorities  to  tasks,  a response  time  may  be  specified  for  a task.  This  response 
time  Is  similar  to  the  recognition  time  for  control  robotics  and  specifies  the  maximum 
delay  between  a request  for  a task  activation  and  the  initiation  of  that  task. 
Another  feature  of  TOMAL  is  that  interrupt  routines  only  request  task  activation  and 
do  not  respond  to  the  interrupt  in  any  substantive  way.  This  reduces  the  amount 
of  object  code  that  does  not  run  under  the  task  scheduler  and  allows  the  TOMAL 
system  to  check  the  consistency  of  the  real-time  constraints  for  the  entire  system. 
However,  TOMAL  makes  no  attempt  to  verify  real-time  specifications  on  service 
times  for  tasks. 

Data  flow  schemata  deserve  mention  as  a real-time  system  since  one  proposed 
applications  is  digital  signal  processing  [2,  22].  It  is  designed  to  facilitate  highly 
parallel  computation  and  statements  may  be  executed  as  soon  as  all  their  Input 
variables  have  been  computed.  If  several  statements  are  executable  an  arbitrary 
statement  is  chosen.  However,  with  the  addition  of  real-time  constraints  to  mediate 
this  decision,  data  flow  would  be  a powerful  real-time  system.  The  other  major 
drawback  of  data-flow  is  that  is  not  suited  for  implementation  on  conventional 
computer  architectures. 
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1.2:  Statement  of  the  Problem 

The  goal  of  this  research  is  to  develop  theory  that  is  applicable  to  the 
implementation  of  a programming  system  designed  to  the  restricted  domain  of  time- 
critical  applications.  The  main  criterion  of  the  suitability  of  the  language  to  this 
domain  should  be  that  small  changes  In  the  real-time  specifications  should  result  in 
small,  obvious  changes  In  the  source  program.  It  Is  conceivable,  and  Indeed 
desirable,  that  these  changes  could  have  a dramatic  effect  on  the  object  program 
produced.  This  reorganization  of  the  object  program  is  precisely  the  process  that 
should  be  automated. 

Conventional  languages  already  provide  facilities  for  functional  and  data 
abstraction,  and  numerous  researchers  are  already  working  in  this  area.  Therefore, 
this  research  will  focus  on  the  global  control  structure  for  programs.  This  Includes 
Issues  such  as  the  number  of  processors  to  use  in  an  Implementation,  deciding 
what  interrupt  structure  (If  any)  is  necessary,  decomposing  the  program  Into  tasks, 
and  assigning  parameters  required  by  the  appropriate  task  scheduler. 

Since  normal  language  semantic  Issues  are  being  avoided,  the  description  of  a 


program  can  be  made  extremely  simple.  The  intuitive  model  for  a real-time  program 
is  that  of  continuous  time  analog  block  diagrams.  The  graph  defines  a precedence 
relation  among  operators  Identical  to  the  data  flow  in  the  diagram.  The  program  will 
be  specified  as  a directed  graph  of  actions  to  be  performed  and  their  functional 
dependence,  with  arcs  of  the  graph  representing  data  paths.  The  graph  must  be 
acyclic  since  cycles  In  a block  diagram  represent  feedback  systems.  Automatically 
producing  an  object  program  that  solves  the  feedback  equation  would  require  more 
detailed  semantics  for  the  programs  as  well  as  other  disciplines  outside  the  scope 
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of  this  research.  However,  In  some  special  cases,  cycles  can  be  handles  by 
rearranging  the  block  diagram.  A strict  upper-bound  must  be  placed  on  the 
computation  time  required  for  each  action.  The  real-time  constraints  specify  upper 
bounds  of  the  propagation  delays  through  the  block  diagram  and  of  the  bandwidths 
of  the  Input  and  output  signals. 


1.3:  Thesis  Overview 

Chapter  2 develops  the  block  diagram  model  of  computation.  The  block  diagram 
model  Is  a program  schematic  model  similar  to  data  flow.  However,  real-time  and  an 
external  environment  are  explicit  in  the  model.  In  addition,  the  block  diagram  model 
separates  the  data-flow  of  the  schema  from  the  control  flow,  which  is  embodied  in 
the  control  structure.  The  control  structure  specifies  the  execution  order  of  the 
blocks  at  object  time.  The  research  problem  may  be  formalized  as  finding  control 
structures  for  block  diagram  schemas  which  satisfy  the  given  real-time 
specifications.  The  major  use  of  the  model  Is  to  define  the  semantics  of  the  real- 
time specifications. 

Chapter  3 Investigates  various  static  control  structures  (control  structures  that 
sre  Independent  of  the  data  values  at  object  time).  Although  static  control 
structures  may  be  used  widely  In  specific  applications  (particularly  in  small, 
dedicated  systems  such  as  those  Implemented  on  microcomputers),  they  have  been 
Ignored  by  designers  of  real-time  programming  systems,  mainly  because  their  real- 
time performance  In  the  general  case  has  not  been  studied. 

Chapter  A investigates  extended  semantics  where  the  external  Inputs  do  not 

change  continuously.  In  this  situation,  a dynamic  control  structure  may  be  used.  A 
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dynamic  control  structure  is  a control  structure  that  does  depend  on  the  data 
values  at  object  time.  The  chapter  investigates  a subclass  of  dynamic  control 
structures,  namely  static  priority  interrupt  control  structures.  The  prototypical 
example  Is  an  interrupt  system  where  the  system  does  nothing  until  an  input 
changes,  although  It  Includes  systems  without  physical  Interrupts  where  the  Inputs 
are  sampled.  The  priorities  are  static  as  opposed  to  the  earliest  deadline 
scheduler  where  the  priority  of  a task  Is  a function  of  time. 

Chapter  5 discusses  some  of  the  Issues  that  arise  when  more  than  one 
processor  Is  available  for  the  Implementation.  The  real-time  performance  of 
multiprocessor  systems  are  analyzed  and  the  real-time  performance  of  a block 
diagram  schema  is  bounded.  Some  techniques  for  distributing  the  processing  among 
several  processors  are  suggested,  although  specific  algorithms  are  not  studied. 
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2:  Block  Diagram  Schemata 

Most  models  of  computation  do  not  capture  the  notion  of  a "real-time"  system 
which  monitors  continuously  changing  inputs  from  some  external  environment.  Block 
diagram  schemata  model  the  external  environment  explicitly  and  recognize  the 
existence  of  real-time  specifications  placed  by  the  environment  on  the  computing 
mechanism.  They  are  based  on  the  Intuitive  model  of  the  conventional  analog  block 
diagram  whose  inputs  and  outputs  are  changing  continuously.  An  ( m,n ) block 
diagram  schema  consists  of  an  (m,n)  block  diagram  module,  a control  structure,  a 
configuration  and  an  environment  which  manipulates  the  configuration 
asynchronousiy  with  the  control  structure.  Within  the  model,  It  is  assumed  that 
values  change  continuously.  Obviously,  the  computations  cannot  be  performed 
continuously  on  a digital  computer.  The  real-time  specifications  determine  how 
often  the  control  structure  must  compute  new  values,  as  well  as  how  fast  it  must 
compute  them. 

An  (m,n)  block  diagram  module  Is  a directed  graph  whose  nodes  are  either 
blocks  or  links.  The  terms  predecessor  and  successor  will  be  used  with  the 
conventional  definitions.  Data  is  stored  in  the  links  while  the  blocks  perform  the 
actual  computation.  Accordingly,  only  one  arc  may  point  to  each  link.  The  graph 
must  be  proper  in  the  sense  that  arcs  may  not  point  from  links  to  links  or  from 
blocks  to  blocks.  Uppercase  letters  will  be  used  to  denote  blocks  and  lower  case 
letters  to  denote  links.  The  predecessor  of  a link  Is  called  the  specifier  of  that 
link  and  the  successors  of  a link  are  called  the  watchers  of  the  link.  The 
predecessors  and  successors  of  a block  are  called  the  Inputs  and  outputs  of  the 
block  respectively. 

An  (m,n)  module  has  m links  with  no  Input  arcs  (Input  links)  and  n links  with  no 
output  arcs  (output  links).  The  Input  links  receive  their  values  from  an  external, 
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continuous  time  function  called  the  input  signal.  The  values  at  the  output  links 
define  an  external,  continuous  time  function  called  the  output  signal. 

The  model  assumes  the  existence  of  a global  clock  which  defines  the  passage 
of  real  time.  Hewitt  argues  against  the  use  of  global  clocks  since  they  cannot  be 
implemented  in  distributed  systems  [9].  While  Hewitt's  objections  against  global 
clocks  are  valid,  assigning  times  within  Hewitt's  framework  of  local  orderings  would 
be  more  complicated.  This  complexity  Is  unnecessary  since  the  events  being  timed 
are  always  ordered  by  one  of  Hewitt's  local  orderings. 

A cont!  gut  at!  on  is  an  assignment  of  tokens  to  the  links  of  a schema.  The  token 
contains  a value  and  a set  of  labels  of  the  form  {link,  birth).  These  labels  Indicate 
when  the  token  arrived  at  the  input  link  link.  Each  link  always  contains  some 
token,  since  signals  are  always  defined  in  a continuous  time  block  diagram. 

The  computation  of  a block  diagram  schema  is  described  by  a series  of 
snapshots.  A snapshot  consists  of  a block  diagram  module  and  an  associated 
configuration.  The  initial  snapshot  assigns  null  values  to  all  tokens  except  for 
tokens  on  the  input  links  of  the  schema  which  are  assigned  the  current  value  of 
the  Input  signal.  The  label  set  of  all  links  Is  Initialized  to  {{link,  0)}.  The 
computation  proceeds  from  one  snapshot  to  the  next  through  the  ffr/ng  of  blocks. 
The  control  structure  Is  the  strategy  for  choosing  which  block  to  fire  next.  The 
fired  block  accesses  the  tokens  on  Its  input  links,  and  replaces  the  tokens  on  Its 
output  links.  The  label  set  for  the  output  token  becomes  the  union  of  the  old  label 
set  of  the  token  and  the  label  sets  that  were  assigned  to  the  tokens  on  all  the 
Input  links  of  the  block.  The  time  in  the  label  (/.t)  for  the  link  / at  each  Input  arc 
of  the  fired  block  is  replaced  by  the  label  {l,tlme),  where  time  la  the  current 
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contents  of  the  global  clock.  This  action  occurs  after  any  tokens  have  been 
replaced  on  the  output  links,  but  the  time  for  the  new  label  sets  is  immediately 
after  the  Input  tokens  were  accessed.  In  addition,  If  / is  an  Input  link,  its  value  is 
set  to  the  current  value  of  the  input  signal.  The  block  need  not  replace  any  output 
tokens.  This  differs  from  data  flow  since  tokens  are  not  removed  from  the  Input 
links  after  a block  Is  fired.  The  data  flow  restriction  is  not  appropriate  since  the 
value  of  a token  Is  defined  at  all  times. 

The  amount  of  computation  time  used  by  block  A is  denoted  t^.  If  the  control 

structure  fires  block  A on  some  processor  at  time  t,  that  processor  will  complete 
and  replace  the  output  tokens  on  that  block  by  the  time  f+t^.  The  computation 

times  used  will  be  upper  bounds  either  computed  by  whatever  language  processor 
la  used  to  create  the  primitive  blocks  or  declared  by  the  user. 


2.1:  Real-Time  Performance  and  Specifications 

A block  diagram  schema  Is  an  approximation  to  a continuous  time  block  diagram. 
There  are  many  factors  affecting  the  quality  of  the  approximation.  However,  the 
factors  Influenced  by  the  control  structure  are  how  long  the  schema  takes  to 
compute  the  va'ues  of  output  tokens  from  the  Input  tokens,  and  how  often  it 
performs  these  computations.  The  real-time  specifications  will  place  bounds  on 
these  quantities.  A control  structure  that  satisfies  all  the  real-time  specifications  is 
called  a feasible  control  structure. 

The  age  of  a token  with  respect  to  a link  / at  time  t Is  defined  as  t-tQ  If  (/,  f Q) 
Is  In  the  label  set  of  the  token,  and  undefined  otherwise.  The  latency  between 
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links  a and  6 Is  denoted  1^  b and  is  the  upper  bound  of  the  age  at  any  time  of 

tokens  at  b with  respect  to  link  a.  The  user  can  specify  an  upper  bound  on  the 
latency  botween  two  links.  The  first  link  will  be  an  Input  link  of  the  schema  and 
the  second  link  will  be  an  output  link. 

Latency  specifications  can  also  be  expressed  In  terms  of  continuous-time 
functions: 

6(f)  - F(a(f-A(t)),  • • • ).  A(t)sla  to  (SM) 

Here  6(f)  Is  the  function  whose  value  Is  the  value  of  the  token  at  link  b at  time  t; 
a(f)  Is  the  function  whose  value  Is  the  signal  at  link  a at  time  t;  A(f)  corresponds 
to  the  age  of  the  tokens  at  link  b.  Notice  that  A(f)  Is  generally  qq!  constant,  but 
Is  bounded.  The  user  knows  how  close  6(f)  must  be  to  6(f)  - F(a(f),  • • • ).  Using 
Information  about  the  magnitude  of  F and  a and  their  derivatives,  the  user  can  use 
equation  (2-1)  to  calculate  the  latency  specifications  necessary  to  achieve  the 

a 

desired  accuracy  of  6(f). 

The  other  measure  of  real-time  perfoi  mance  Is  how  often  new  values  are 
computed.  The  bandwidth  from  link  a to  link  6 (notation  B J)  Is  the  maximum  rate 

at  which  the  control  structure  must  compute  new  values  at  6 from  values  at  a.  The 
bandwidth  specification  Is  not  easily  expressible  In  terms  of  continuous-time 
functions.  It  may  be  thought  of  as  a requirement  on  how  often  the  value  of  6(f) 
must  change. 

The  bandwidth  specification  may  seem  superfluous  since  the  latency 
specifications  also  implies  how  often  the  value  of  6(f)  changes.  However,  It  Is 
possible  for  a multiple  processor  control  structure  to  exhibit  bandwidth  performance 
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that  exceeds  the  rate  Implied  by  the  latency  specification.  An  example  is  shown  in 
figure  2-1. 


tyj  ” 10/nsec 

\g  ■ 10msec 

B_  - 75/sec 

I - 40msec 
a,c 

A block  diagram  schema  requiring  a multi-processor  control  structure 

Figure  2-1 

In  this  example,  both  A and  B require  ten  milliseconds  of  computation  time.  A 
single  processor  control  structure  that  executes  ABABAB  • • • can  guarantee  a 
latency  from  a to  c of  forty  milliseconds  and  a bandwidth  from  a to  c of  fifty  per 
second.  However,  If  processor  one  executes  AAA  • • • and  processor  two 
executes  BBB  ■ • • , then  the  latency  from  a to  b is  still  only  forty  milliseconds  but 
the  bandwidth  increases  to  one  hundred. 

While  the  block  diagram  model  Is  useful  for  defining  performance  for  real-time 
programs,  it  does  not  yield  many  insights  into  the  problem  of  synthesizing  a feasible 
control  structure.  The  graph  itself  resembles  a partial  order  on  a set  of  tasks,  but 
the  semantics  of  block  diagram  schemata  are  not  as  restrictive  as  this  partial 
order.  In  most  schematic  models,  a task  must  not  be  executed  until  all  its 
predecessors  have  been  executed  since  (presumably)  it  would  not  have  data 
available  at  all  its  Inputs.  The  block  diagram  model  has  no  such  restriction  and  as 
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• result  is  able  to  execute  some  parts  of  the  schema  more  often  than  other  parts. 

On  the  other  hand,  there  are  certain  execution  orders  that  can  be  ruled  out 
since  they  are  obviously  inefficient.  For  example,  once  a block  has  been  fired,  it 
need  not  be  fired  again  until  one  of  its  predecessors  has  been  fired  again  since  alt 
Its  inputs  e/ill  be  unchanged.  Therefore,  it  outputs  will  not  change.  Similarly,  if  no 
successor  of  a block  A Is  fired  between  firings  of  A,  the  previous  execution  of  A 
was  unnecessary  since  no  block  looked  at  the  previous  values  of  the  tokens  on 
the  output  links  of  A. 

If  these  restrictions  are  combined,  each  firing  of  a block  must  be  surrounded  (In 
time)  by  at  least  one  predecessor  and  at  least  one  successor.  Equivalently,  the 
allowable  execution  sequences  may  be  found  by  shuffling  all  the  paths  from  an 
Input  link  to  an  output  link.  These  paths  will  be  referred  to  as  constraint  paths  or 
Just  constraints . 

2.2:  Functionality  of  Blocks 

The  semantics  of  block  diagram  schemata  make  some  useful  block  functions 
awkward  to  implement.  For  example,  a block  that  performs  differentiation  Is 
essential  for  applications  In  real-time  process  monitoring  and  control.  In  classical 
direct  digital  control,  the  system  Is  discretized  by  sampling  at  some  specific  period. 

Differentiators  are  replaced  by  unit  delays  and  the  feedback  gains  are  adjusted 
appropriately.  This  Is  possible  only  because  the  Inputs  are  sampled  at  a known 
frequency. 

In  block  diagram  schemata  there  is  no  guarantee  of  periodic  execution.  The 

bandwidth  specifications  set  a lower  bound  on  how  often  a block  must  be 
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executed,  and  a different  lower  bound  may  be  Implied  by  the  latency  specifications. 
They  do  not  place  any  upper  bound  on  how  often  the  block  is  executed. 
Therefore,  it  Is  Impossible  to  tell  a priori  when  and  how  often  a block  will  be 
executed.  This  would  seem  to  rule  out  any  blocks  that  would  require  state 
variables,  but  this  Is  not  true.  A white  noise  generator  could  be  Implemented  using 
a pseudo-random  number  generator.  This  would  use  a state  variable,  but  it  would 
not  run  Into  any  problems  by  not  knowing  how  often  It  Is  executed.  But  most  other 
functions  that  need  to  produce  or  transform  a time  dependent  sequence  of  values 
will  be  impossible  to  Implement. 

The  only  general  solution  to  the  problem  Is  to  have  a real-time  clock  as  part  of 
the  system.  Then  a differentiation  block  could  remember  both  its  previous  input  and 
the  time  It  was  last  executed  and  compute  the  obvious  first  order  approximation. 
The  major  difficulty  Is  that  the  real-time  clock  would  have  to  provide  much  finer 
resolution  than  the  60  cycle  clocks  found  In  typical  computer  systems. 

The  user  should  be  able  to  define  his  own  time  dependent  functions  since  any 
selection  of  primitive  blocks  will  probably  turn  out  to  be  too  limited  for  some 
application.  Therefore,  It  becomes  necessary  to  provide  some  primitive  blocks 
which  would  probably  lead  to  nonsensical  programs  If  used  carelessly.  In  particular, 
If  the  user  had  a unit  delay  block  and  access  to  the  real-time  clock  he  could  define 
arbitrary  approximations  to  differentiators,  although  undisciplined  used  of  the  unit 
delay  block  would  result  In  useless  programs. 

Implementing  Integration  would  still  be  a problem  since  the  block  diagram  for  a 
first  order  Integrator  would  contain  a cycle  (see  figure  2-2).  The  problem  with 
cycles  Is  that  it  Is  unclear  whether  the  cycle  represents  use  of  a state  variable, 
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as  In  data  flow,  or  Implied  solution  of  simultaneous  equations,  as  In  continuous  time 
block  diagrams.  In  the  case  of  integrators  it  is  clear  that  the  cycle  represents  use 
of  a state  variable,  since  the  cycle  contains  a unit  delay  block.  In  this  case,  the 
cycle  can  be  broken  at  the  Input  to  the  delay  block.  The  delay  block  is  treated  as 
a watcher  of  link  e,  even  though  ft  gets  its  input  from  link  f.  This  transformation 
alters  the  order  In  which  the  blocks  are  executed  by  changing  the  constraint 
paths.  Unit  delays  were  handled  by  a similar  transformation  In  BLODI  [11],  a 
system  for  simulating  discrete  time  block  diagrams,  and  would  be  handled  In  the 
same  way  by  a programmer  [21]. 


t i me 


A Block  Diagram  Containing  a Cycle 
Figure  2-2 
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2.3:  Example 

The  interaction  between  the  real-time  specifications  and  the  control  structure 
can  be  illustrated  by  a series  of  examples.  In  these  examples  the  block  diagram 
module  Is  left  unchanged  while  the  latency  and  bandwidth  specifications  are  varied. 
These  variations  will  necessitate  changes  in  the  control  structure  used  to 
Implement  the  block  diagram  schema.  The  block  diagram  module  itself  Is  shown  in 
figure  2-3. 


b 


Typical  block  diagram  schema 
Figure  2-3 

The  simplest  control  structures  to  consider  are  cycles  that  repeatedly  execute 
the  blocks  in  some  fixed  order.  There  3!  (=  6)  ways  of  executing  four  blocks  once 
per  cycle  (Ignoring  starting  transients).  For  a small  example  like  this  it  is  feasible 
to  enumerate  all  such  cycles  and  test  them  to  see  If  they  satisfy  the  latency 
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constraints^.  All  these  control  structures  are  independent  of  when  new  tokens 
actually  arrive.  The  worse-case  assumption  is  that  a new  token  arrives  immedlately 
after  the  previous  token  Is  marked  old.  This  assumption  is  used  In  calculating 
worst-case  latencies,  which  are  shown  In  figure  2-4.  Notice  that  although  ABCD  is 
better  than  ACBD  and  AOBC  Is  better  than  ADCB,  there  !s  no  best  control  structure. 
In  fact,  we  can  choose  latency  specifications  such  that  only  one  of  the  control 
structures  will  work.  The  first  six  control  structures  in  figure  2-4  sample  the  Inputs 
once  per  cycle,  i.e.  once  every  30  time  units.  However,  If  any  of  the  bandwidths 
B^  _ , B # or  Brt  # Is  greater  than  1/30  then  some  other  control  structure  must  be 

used. 


Control 

Structure 

’a.c 

*a,f 

ld,f 

ABCD 

46 

60 

46 

ACBD 

66 

60 

60 

ACDB 

60 

66 

46 

ADCB 

60 

46 

60 

ADBC 

60 

46 

66 

ABDC 

46 

60 

60 

ABDCD 

60 

66 

50 

ADBCD 

66 

60 

50 

ABCABD 

40 

66 

75 

ACDBCD 

76 

70 

40 

ADBADC 

66 

40 

70 

Latencies  for  static  control  structures 
Figure  2-4 

A slightly  more  complicated  class  of  control  structures  is  cycles  where  some 
blocks  may  be  executed  more  than  once.  For  example,  the  control  structure 


1.  However,  such  an  algorithm  Is  not  practical  since  the  computation  time  taken  by 
such  an  algorithm  would  grow  exponentially  with  the  number  of  blocks. 
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ABCABO  has  worst-case  latencies  as  shown  in  fignre  2-4.  this  control  structure  will 
satisfy  Its  bandwidth  constraints  if  B Is  less  than  one  every  twenty  time  units 

a(C 

and  ^ and  B^  ^ are  less  than  one  every  forty-five  time  units. 

The  next  class  of  control  structures  to  consider  are  dynamic  control  structures 
with  static  priority  scheduling.  These  control  structures  make  use  of  the  current 
environment  to  determine  which  blocks  to  Are  next.  The  dynamic  control  structures 
assume  that  the  values  of  tokens  at  Input  links  do  not  change  continuously.  When 
the  value  of  a token  at  an  input  link  changes,  a request  is  made  for  a set  of  tasks 
The  request  is  serviced  by  firing  a fixed  sequence  of  blocks  as  specified  by  the 
task.  Since  the  processor  Is  generally  busy  when  a request  occurs,  the  requests 
are  remembered  until  the  processor  is  Idle,  when  one  of  the  requested  tasks  Is 
selected  to  be  executed.  Each  task  Is  assigned  an  Integer  priority.  The  task  with 
the  highest  priority  Is  serviced  next.  The  scheduler  Is  static  since  the  priority  for 
a task  is  always  the  same  relative  to  other  tasks.  The  earliest  deadline  scheduler 
is  an  example  of  a dynamic  priority  scheduler,  since  the  priority  of  a task  depends 
on  Its  current  deadline.  If  the  task  being  serviced  can  be  temporarily  suspended, 
the  control  structure  is  preemptive. 

A dynamic  control  structure  need  not  be  interrupt  driven.  For  example,  the 
control  structure  could  sample  the  inputs  between  executing  blocks.  However, 
preemptive  control  structures  cannot  be  Implemented  without  interruputs. 

In  the  example  of  figure  2-3,  there  are  many  ways  to  construct  tasks  to  be 
requested  by  changing  inputs.  One  such  task  system  is  to  fire  480  (or  408)  when 
the  value  at  a changes,  and  CD  when  the  value  at  d changes.  The  worst  case 

occu.  s when  the  values  at  a and  d change  simultaneously.  The  latencies  for  this 
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case  are  shown  in  figure  2*5.  These  latencies  can  be  sustained  only  if  the 
bandwldths  at  a and  d are  both  less  than  once  every  35  time  units  (otherwise  the 
control  structure  would  fall  behind).  In  a sustained  worst  case,  new  tokens  arrive 
once  every  35  time  units.  A trace  of  block  firings  would  seem  to  indicate  that  the 
static  control  structure  ABDCD  is  being  executed,  which  has  latencies  15  to  20 
units  larger  than  those  for  the  dynamic  control  structure.  However,  in  the  dynamic 
case  it  is  known  exactly  when  the  input  signal  change.  In  particular,  the  processor 
will  be  idle  if  more  than  35  time  units  elapse  between  a change  in  input  signals,  so 
the  processor  will  be  able  to  respond  tc  a change  Immediately.  In  a static  control 
structure,  the  change  would  not  be  responded  to  until  the  control  structure  gets 
around  to  it. 


I 


Latencies  for  dynamic  control  structures  with  static  schedulers 

Figure  2-5 
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Task  String 

’a.c 

'a,f 

ld.f 

Priority 

1 2 

ABD 

CD 

30 

35 

16 

ADB 

CD 

35 

30 

15 

CD 

ABD 

15 

20 

35 

CD 

ADB 

20 

16 

36 
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3t  Static  Control  Structures 

The  main  function  of  the  control  structure  In  a schema  Is  to  specify  when  to  ftre 
each  block.  If  the  control  structure  Is  Independent  of  the  configuration  (l.e. 
unaffected  by  changes  made  by  the  environment)  It  Is  a static  control  structure. 
An  example  of  a static  control  structure  Is  a loop  which  fires  all  of  the  blocks  In 
the  schema  cyclically.  Control  structures  which  make  use  of  configuration  (o.g.  via 
Interrupts)  are  called  dynamic  control  structures. 

The  latency  specification  from  a to  b will  be  satisfied  only  If  all  the  blocks  along 
all  paths  from  a to  b are  fired  at  least  once  during  each  time  Interval  of  duration 
*ab  **me  unit8‘  Otherwise  there  would  be  time  Intervals  longer  than  la6  when  the 

e-label  at  b will  not  change  and  therefore  the  age  with  respect  to  a of  the  token  at 
b will  be  greater  than  1^^.  Similarly,  the  bandwidth  specification  from  a to  b will  be 

satisfied  If  and  only  If  the  Interval  between  firing  the  blocks  along  the  constraint 
paths  Is  less  than  1 / &ab- 

For  single  processor  control  structures  It  Is  possible  to  construct  a traca  of  the 
blocks  that  are  fired  by  the  control  structure.  The  trace  la  a string  over  an 
alphabet  Z whose  elements  correspond  to  the  blooks  of  the  schema.  Each  element 
A of  Z Is  assigned  a weight  (notation  |A  | ) equal  to  t^.  The  weight  of  a string  is 

defined  to  be  the  sum  of  the  weight  of  Its  elements.  A string  Sj  contains  If  all 

the  elements  of  S2  appear  In  S ^ In  the  order  they  appear  In  S^.  For  example,  the 

string  ABODE  contains  the  string  BD,  even  though  BD  Is  not  a substring  of  ABODE. 
Regular  expressions  will  be  used  to  denote  sets  of  strings.  In  particular,  If  S Is  a 


string,  S denotes  the  set  of  strings  S,  SS,  SSS,  • • • as  well  as  the  empty  string. 

It  Is  necessary  to  model  Intervals  In  continuous  time  of  arbitrary  origin  and 
duration,  since  the  latency  specifications  require  gll  Intervals  of  specific  duration  to 
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contain  the  corresponding  constraint  path.  Therefore  the  weight  of  the  Initial  and 
final  elements  of  a string  may  be  counted  at  less  than  their  nominal  weights.  For 
example,  If  |a^  • • • a^  | - w (weighting  a^  and  at  | « ^ | and  |a^|),  then 

[a^  • • • a^]  Is  a string  of  weight  less  than  w since  both  a^  and  a^  are  weighted 

at  less  than  | a ^ | and  | a^  | . However,  If  the  Initial  cr  final  elements  do  not  have 

full  weights,  the  may  not  be  Included  as  part  of  any  contained  string.  Weighting 
these  elements  at  less  than  their  full  values  corresponds  to  shrinking  an  Interval  of 
size  w In  continuous  time:  If  the  Interval  starts  after  a^  starts  executing,  then  the 

Interval  does  not  contain  a 1 reading  Its  Inputs.  A string  will  be  preceded  by  a '[* 

or  followed  by  a 'J  If  the  first  or  last  element  In  the  string  Is  weighted  at  less  than 
Its  nominal  value. 

A single  processor  static  control  structure  Is  completely  specified  by  Its  trace, 
which  Is  determined  at  compile  time  (hence  the  name  static  control  structure).  The 
real-time  specifications  on  the  control  structure  can  be  rephrased  as  constraints  on 
Its  trace.  In  particular,  the  latency  specification  from  a to  b Is  satisfied  If  and  only 
If  all  the  constraint  paths  from  a to  b are  contained  in  every  substring  In  the  trace 
of  weight  la^.  The  bandwidth  specification  Is  satisfied  If  and  only  If  the  weight  of 

all  substrings  between  occurrences  of  the  constraint  paths  are  less  than  1 

At  this  point  It  Is  possible  to  deal  exclusively  with  the  trace  of  the  control 
structure  and  the  constraint  paths.  Constraint  path  / will  be  denoted  with 

latency  specification  ly  and  bandwidth  specification  B^.  If  Is  a path  from  a to  b, 

If  - ^ and  »Ba^.  It  will  also  be  necessary  to  deal  with  the  tails  of  the 
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constraint  paths.  ^ c/  " c/  ic/  2 ‘ ' c>  n ‘ w^®re  c/  yG^  t,,en  the  tail  °*  C/  is 

c/,y  "c/.yc/.yi  ' * * cl.n' 

Since  the  control  structure  must  satisfy  the  real-time  specifications  for  all  time, 
the  trace  corresponding  to  the  control  structure  will  be  a Infinitely  long  string. 
Since  the  control  structure  can  be  Implemented  only  If  the  trace  can  be  generated 
using  a finite  program,  It  would  be  very  awkward  If  the  only  feasible  control 
structures  were  acyclic.  Fortunately,  It  can  be  proved  that  If  any  feasible  control 
structure  exists,  then  there  exists  a feasible  control  structure  that  fires  the  blocks 
In  some  cyclic  order. 


3.1:  Existence  of  Cyclic  Control  Structures 

The  theorem  proved  in  this  section  can  be  stated  as: 

Nr 

Suppose  there  exists  a string  M-a^a^ag*-’  «I  such  that  w satisfies 
the  reaFtime  constraints.  Then  there  also  exists  a finite  string  fi  such  that 
the  string  0*  also  satisfies  the  reaFtime  specifications. 

This  theorem  will  be  proved  using  several  lemmas. 

Definition:  A critical  window  of  a control  structure  » for  the  constraint  Cy  Is  a 
substring  fy  - a^  • • • am  of  w that  contains  two  occurrences  of  Cy,  but 
[f  y ] contains  no  occurrences  of  Cy . 


The  most  critical  window  tor  C y is  the  critical  window  with  the  greatest 
weight. 

Lemma  3-1:  The  string  w satisfies  the  latency  specifications  for  Cy  if  and  only  if 
y | Sly  for  the  most  critical  window  f y In  ». 
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Prasfi 

only  If:  Assume  » satisfies  the  real-time  specifications.  Then  any  substring 
of  ««  of  weight  ly  contains  Cy.  In  particular,  the  substring 

fa.  of  weight  I,  must  contain  C,.  Since  [f,l  does  not 

1 a m m+ 1 J / / / 

contain  Cy , the  substring  [fy  of  weight  ly-«,  where  « Is  arbitrarily  smell 
contains  one  occurrence  of  Cy.  Therefore,  l^y  | sly  +«,  »>0. 

Hi  Assume  the  most  critical  window  fy  has  weight  greater  than  ly.  Let  y 
be  any  substring  of  [fy  j where  |y|  » ly.  y exists  since: 

|[*y]|  “ |#/H>l/-a 

Since  ^y  is  a critical  window,  then  [^.J  contains  no  occurrences  of  Cy. 

But  y Is  a substring  of  [fy]  and  also  does  not  contain  Cy.  Hence,  y is  a 
substring  of  w of  weight  ly  that  does  not  contain  the  constraint  path. 
Therefore,  w does  not  satisfy  the  latency  specifications.  ■ 

Corollary:  Since  # . contains  two  occurrences  of  Cy , the  period  between 
successive  occurrences  of  Cy  must  be  less  than  ly  - |Cy  j. 

This  lemma  shows  there  is  a time  limit  between  the  starts  of  successive 
occurrences  of  Cy.  The  bandwidth  specifications  directly  limit  this  Interval. 

Therefore,  it  will  be  assumed  that  the  latency  specifications  are  more  severe  than 
the  bandwidth  specifications.  If  not,  the  latency  specifications  can  be  adjusted  so 
that: 


The  time  remaining  until  the  start  of  the  next  appearance  of  a constraint  path  is 

called  the  laxity  of  that  constraint.  Given  a control  structure,  we  can  construct  a 

table  of  laxities  for  each  position  in  the  corresponding  string  » with  the  property 

that  the  table  entries  are  non-negative  If  and  only  if  w satisfies  the  latency 

specifications.  The  only  difficulty  Is  in  accurately  determining  the  start  of  an 

occurrence  of  a constraint  string.  This  will  be  handled  by  keeping  laxities  for  the 
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j 

. 


tails  of  the  constraint  strings.  The  true  laxity  for  a string  will  be  reflected  in  the 
laxities  of  Its  tails  If  the  start  of  the  constraint  path  Is  falsely  identified. 

An  element  of  tho  table  d[/,y,A]  Is  the  laxity  for  the  path  y just  before  Is 

fired.  The  table  should  be  thought  of  as  rectangular  with  columns  labeled  by 
elements  of  w.  The  entries  in  the  first  column  are: 

d[/.y.O]-lr|C/y|  (3-1) 

since  the  constraint  path  must  occur  by  ly  - |Cy  y |.  The  remaining  columns  can 

be  filled  in  by  simple  recursion  rules. 

If  the  next  element  In  » Is  not  the  same  as  the  first  element  In  a constraint 
path,  the  laxity  for  that  path  decreases  by  the  weight  of  that  element: 

d[/,y,A]-|aA  | (3-2) 

There  are  two  possibilities  If  the  next  element  in  the  solution  Is  the  same  as  the 
flrat  element  In  a constraint  path.  If  this  la  the  start  of  an  occurrence  of  a 
constraint  path,  the  laxity  for  the  tail  of  that  path  should  be  no  more  than  the 
current  laxity  for  the  constraint  path.  It  Is  possible  that  the  .all  will  already  have 
a more  severe  laxity  since  different  constraint  paths  can  have  identical  tails.  In 
addition,  the  laxity  for  the  whole  constraint  path  will  become  the  original  limit  the 
Inatant  after  the  first  element  appears.  Therefore,  the  laxity  becomes  the  original 
laxity  minus  the  weight  of  the  first  element. 

However,  If  a^  Is  not  the  start  of  an  occurrence  of  y , the  laxity  should 

decrease  by  |a^|.  Fortunately,  this  problem  will  be  handled  automatically  by 
assuming  that  an  occurrence  of  y starts  whenever  a^  - y . If  It  Is  not  part  of 
an  occurrence  of  Cy  y , . will  appear  again  before  all  of  . appears.  When  this 
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happens,  the  laxity  for  c/  j + ^ will  have  decreased  by  the  amount  the  laxity  for 
Cf  j should  have  decreased  If  the  start  of  the  path  had  not  been  incorrectly 
Identified.  When  . appears  again,  the  laxity  for  C , y + j will  be  less  than  the 
laxity  for  j . Therefore: 


arcjj  * 


d[/,y  ♦ 1 .A  ♦ 1 ] - m/n(d[ I ,j ,A  ].d[/,y  + 1,A]-|ak  |) 

d[/,y.A  + 1]-lr|C/ty  |-|eA| 


(3-3) 


Equations  (3-2)  and  (3-3)  can  be  transformed  to  produce  rules  for  computing  the 
A+1st  column  of  the  laxity  table  from  the  Ath  column: 


d[/.y,A  + 1] 


,fak“cl,J 


'/■1c/,yl 

m/n(d[/.y-1,A],  d[/,y,A]-|aA  |)  (3*4) 

d[/.y.A]-|a^  | lf  aA#c/,y,c/,y-i 


As  an  example,  figure  3-1  shows  the  laxity  table  for  the  control  structure  ABCD 
and  the  block  diagram  module  from  figure  2-3. 

In  this  table,  the  laxities  at  time  60  are  Identical  to  the  laxities  at  time  30.  The 
next  column  In  the  table  would  be  Identical  to  the  column  at  time  40.  The  rest  of 
the  table  becomes  periodic,  and  all  the  entries  are  non-negative.  The  periodicity 

nr 

allows  us  to  prove  that  (AflCD)  will  satisfy  the  latency  specifications  for  all  time. 
This  Is  formalized  in  the  following  lemmas: 

Lemma  3-2:  If: 

V/>y  d[/,y,m  ] i <!•[/, y,A  ] and  aR  - a’m 

then: 
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If  m - mfiy  satisfies  the  latency  specifications  and: 

V/(/  d[/J.*]-d[/./.m] 

then: 


m'-mfifiy 


“a’la2 


also  satisfies  the  latency  specifications. 

Proof:  Construct  the  laxity  table  <f  for  *#': 

v/  y<f[/.y.0]-ir|c/  y|  - dt/.y.o) 

Since  a^  - a^,  (3-4)  leads  to: 


J 
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vM<r[/,y.i]-d[/,/,i] 

Similarly: 

<3’6> 

Therefore 

V/yrft>,y./n]-dltM.A] 

From  lemma  3-2: 

-d[/.y.A*1]iO 


Similar  reasoning  will  show: 

• d[/.y./n-1]i0 

Now  *'2m-k  “ am>  90  lemma  3-2  still  applies: 

V,  j 2m - A ] * d[/,y jn  ] i 0 

Inductively: 

V,  y /imd'[,.y./+/n-A]i  d[/.y./]i  ° (3-8) 

Combining  (3-5)  and  (3-8): 

V/y/cT[/,y,/]iO 

Therefore,  from  lemma  3-1,  *»'  satisfies  the  latency  specifications.  ■ 

Corollary;  Let  «-a1  • • • • • • am_1(  and  >-«„,*•*•  if  *-«*> 

satisfies  all  the  latency  specifications  and  d[/,y,Aj  - d[/,y,/n]  for  some 
* 

k<m,  then  mfi  also  satisfies  the  latency  specifications.  The  proof  is  by 
Induction.  ■ 

The  main  theorem  can  now  be  proved  by  showing  that  any  laxity  table  will  have 
duplicate  columns  and  applying  lemma  3-3: 

Theorem  3-4:  If  any  string  » satisfies  the  latency  specifications  then  there  exists  a 
string  of  the  form  9 " which  also  satisfies  the  latency  specifications. 
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Proof:  Construct  the  laxity  table  for  *.  There  are  a finite  number  of 
possibilities  for  oach  table  entry  since  each  entry  is  l#.  - | C#-  | minus  a sum 

of  a finite  number  of  |aA|’s.  The  number  of  different  |tiA|’s  Is  limited  by 

the  number  of  blocks  in  the  block  diagram  schema.  The  number  of  terms  in 
the  sum  must  be  finite  since  each  |aA|  is  greater  than  2ero  and  the  laxity 

entry  is  also  greater  than  or  equal  to  zero.  Therefore,  the  possibilities  for 
each  column  are  limited  and  eventually  some  column  in  the  table  will  be 
repeated  and  A and  m satisfying  the  conditions  of  lemma  3-3  exist. 

Applying  the  corollary  to  lemma  3-3  says  a solution  of  the  form  mfl*  exists. 
However,  d[/,y,  1]  - 1 . — | C - | 2 d[/./,A],  for  all  k (the  rules  for  filling  in  the 

table  never  increase  the  laxities  except  to  set  d[/,J,A]  to  - 1 . | . 

Applying  lemma  3-2  shows  that  fi*  is  also  a solution.  ■ 

The  major  implication  of  this  theorem  is  that  only  cyclic  strings  need  to  be 
considered  for  static  control  structures.  These  strings  can  be  enumerated,  so  the 
problem  of  finding  a static  control  structure  Is  in  principal  solvable.  Since  the  proof 
also  places  an  upper  bound  on  the  length  of  the  cycle  (equal  to  the  total  number  of 
possible  laxities  at  any  position),  so  an  algorithm  that  generated  all  possible  strings 
would  be  effective  In  the  sense  that  it  would  always  halt  In  a finite  amount  of  time. 
However,  It  would  require  computation  time  that  grows  exponentially  with  the 
complexity  of  the  schema,  so  the  problem  would  be  computationally  Intractable  If 
this  were  the  only  algorithm. 

3.2:  Generating  Real-Time  Control  Structures 

The  problem  of  generating  a feasible  control  structure  is  a scheduling  problem. 

The  problem  Is  deterministic  since  the  parameters  of  the  problem  are  strictly 

bounded  as  opposed  to  being  unbounded  random  variables.  A wide  varieties  of 

special  cases  of  the  general  scheduling  problem  have  been  studied,  and  some 

results  are  surveyed  by  Gonzalez  [7],  though  relatively  little  work  has  been  done 
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on  scheduling  in  the  presence  of  deadlines. 

Gonzalez  and  Soh  developed  a simple  algorithm  that  minimizes  the  number  of 
processors  used  to  schedule  independent  tasks.  The  tasks  are  statically  assigned 
to  processors  and  always  run  to  completion.  The  deadlines  for  each  task 
correspond  to  the  period  of  the  requests  for  that  task  and  must  be  a power  of 
two.  Their  algorithm  Is  not  optimal  if  the  periods  are  not  a power  of  two  and  no 
optimal  algorithm  is  known,  although  several  heuristic  algorithms  have  been 
Investigated. 

Liu  and  Layland  considered  the  problem  of  scheduling  independent  tasks  on  a 
single  processor  [14].  Each  task  requests  service  periodically  with  a deadline  for 
service  coinciding  with  the  time  for  the  next  request.  They  present  a method  of 
assigning  static  priorities  to  the  tasks  that  will  meet  the  deadlines  If  any  static 
assignment  of  priorities  will.  In  addition,  they  prove  the  schedule  which  executes 
the  task  whose  deadline  Is  earliest  is  optimal  in  the  sense  It  will  meet  the 
deadlines  If  any  schedule  will.  They  then  prove  necessary  and  sufficient  conditions 
for  a set  of  tasks  to  be  scheduled  by  the  earliest  deadline  (ED)  algorithm  to  meet 
all  its  deadlines,  and  conclude  that  ED  algorithm  allows  100%  utilization  of  the 
processor  as  opposed  to  figures  as  low  as  70%  for  static  priority  algorithms. 

Geiger  extended  the  proof  of  the  optimality  of  ED  scheduling  to  include  the  case 
were  the  requests  are  not  periodic  [6].  Flala  presented  the  same  basic  proof  and 
also  derived  necessary  and  sufficient  conditions  for  the  ED  scheduler  with  a mix  of 
periodic  and  aperiodic  tasks  [5]. 

Mok  investigated  scheduling  independent  tasks  on  multiple  identical  processors 
[18].  Mok  shows  that  no  optimal  algorithm  evists  for  this  problem  unless  the 
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deadlines,  computation  times  and  at  least  some  future  request  times  are  known.  An 
algorithm  related  to  the  FD  algorithm  is  presentee  which  is  shown  to  be  optimal  if 
all  requests  are  simultaneous.  This  algorithm  executes  those  tasks  with  the  least 
laxity,  where  the  laxity  of  a task  is  the  deadline  for  the  task  minus  its  remaining 
computation  time.  Unfortunately,  both  the  least  laxity  and  ED  schedulers  are  shown 
to  be  non-optimal  even  for  tasks  with  periodic  requests.  However,  the  least  laxity 
scheduler  Is  optimal  for  periodic  deadlines  where  tasks  may  be  executed  at  any 
time  (l.e.  if  the  deadlines  are  coincident  with  the  next  request,  the  least  laxity 
scheduler  Is  optimal  if  it  is  allowed  to  execute  tasks  before  they  have  been 
requested). 

The  problem  of  scheduling  tasks  related  by  a partial  order  on  multiple  identical 
processors  has  been  studied  by  Manacher  [16].  Deadlines  are  specified  for  any  or 
all  tasks  in  the  system.  Manacher's  algorithm  derives  deadlines  for  all  tasks  in  the 
system  by  using  the  observation  that  a task  must  complete  executing  in  time  to 
allow  its  successors  to  executed  before  their  deadlines.  The  scheduler  then 
executes  those  tasks  with  the  earliest  deadlines  that  have  had  all  their 
predecessors  executed.  This  algorithm  is  not  optimal,  and  does  not  consider  either 
periodic  requests  or  multiple  start-times.  However,  It  is  a reasonable  heuristic, 
especially  as  the  number  of  processors  increase. 

Unfortunately,  none  of  these  results  generalize  to  the  static  control  structure 
problem,  even  for  a single  processor,  although  control  structures  could  be 
constructed  which  would  meet  the  conditions  of  the  particular  special  case  and 
satisfy  the  real-time  constraints.  For  example,  If  the  block  diagram  consisted  of 
unconnected  (independent)  blocks,  the  earliest  deadline  scheduler  could  be  used 


Generating  Real-Time  Control  Structures 


Section  3.2 


with  task  i being  block  I and  the  request  period  for  each  task  being  the  minimum  of 
ly  / 2 and  1 / By.  The  period  between  requests  would  have  to  be  less  than  ly/2 

since  (in  the  absence  of  other  information)  it  is  possible  for  the  task  to  be 
executed  immediately  after  one  request  and  immediately  before  the  following 
deadline.  Lemma  3-1  say?  this  time  interval  must  not  be  greater  than  ly. 

On  the  other  hand,  thes;::  heuristics  areliable  to  be  overty  restrictive,  particularly 
since  they  tend  to  deal  with  independent  tasks.  It  would  be  possible  to  derive 
independent  tasks  from  a block  diagram  schema  by  treating  the  constraint  paths  as 
Independent,  but  at  the  cost  of  introducing  new  blocks  and  much  unnecessary 
computation.  One  promising  approach  for  deriving  a static  control  structure  is  to 
simulate  some  more  general  control  structure  until  a cycle  In  the  trace  of  that 
control  structure  is  found.  An  obvious  choice  of  a more  general  control  structure  Is 
a least  laxity  scheduler  (using  laxities  as  defined  for  block  diagram  schema)  which 
follows  the  partial  order  for  the  tasks  (blocks)  based  on  the  constraint  paths. 
More  precisely,  the  scheduler  would  build  a laxity  table,  with  starred  entries 
Indicating  constraints  strings  which  cannot  be  fired  because  of  the  partial  order. 
The  scheduler  chooses  the  first  block  of  the  unstarred  constraint  string  with  the 
smallest  laxity  to  head  the  next  column.  If  two  constraints  have  the  same  laxity, 
either  can  be  fired  next.  Figure  3-2  shows  such  a laxity  table  for  the  block 
diagram  schema  from  figure  2-3  using  the  same  latency  specifications  as  figure  3-1. 

At  time  40,  none  of  the  latency  specifications  have  been  violated.  However, 
since  there  are  now  two  constraints  with  laxity  0,  at  least  one  entry  In  the  next 
column  will  be  negative.  By  firing  C at  time  10,  an  additional  request  for  C is 
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Counter-Example  to  Least  Laxity  Scheduling 
Figure  3-2 

created  with  deadline  50.  In  the  control  robotics  environment,  the  existence  of 
this  request  makes  scheduling  impossible.  However,  if  B is  fired  and  C is  delayed 
until  time  15,  the  additional  request  also  gets  delayed  to  a point  where  it  is 
possible  to  schedule  all  the  requests.  The  least  laxity  algorithm  simply  does  not 
deal  with  Interactions  between  requests  and  deadlines. 

It  is  Interesting  to  note  that  the  least  laxity  scheduler  fails  for  this  even  If  the 
constraint  path  AD  is  ignored.  The  remaining  constraint  paths  AB  and  CD  are 
Independent,  yet  they  cannot  be  scheduled  using  the  ED  algorithm  using  the  worst- 
case  period  of  ly  / 2.  If  periods  are  kept  at  - | C#-  | , the  tasks  still  cannot  be 

scheduled  by  the  ED  scheduler  if  the  Individual  blocks  are  scheduled  separately. 
The  failure  in  this  case  can  be  viewed  as  an  inability  of  the  ED  scheduler  to  derive 
the  proper  phase  relation  between  the  tasks. 

The  schedule  shown  in  figure  3-3  Is  not  the  only  least  laxity  schedule.  For 
example,  at  time  25  CD  has  the  same  laxity  as  B and  therefore  C could  be  fired 
Instead  of  S.  However,  the  reader  can  verify  that  ail  the  least  laxity  schedules  for 
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this  example  fall  to  satisfy  the  latency  specifications. 

3.3:  A Branch-and-Hound  Method  for  Generating  Control  Structures 

Rather  than  generating  acyclic  control  structures  and  looking  for  a cycle,  the 
, algorithm  described  In  this  section  works  by  generating  a cyclic  control  structure 

that  satisfies  the  real-time  specifications  for  one  of  the  constraint  paths.  The 
solutions  for  other  constraints  paths  are  combined  to  form  a control  structure  that 
satisfies  all  the  real-time  specifications.  The  basic  semantics  of  firing  blocks  rules 
out  control  structures  that  are  not  shuffles  of  the  constraint  paths  since  these 
control  structures  porform  redundant  computations.  Therefore,  this  algorithm  should 
not  miss  any  solutions.  There  are  two  major  problems  that  the  algorithm  has  to 
deal  with:  (1)  How  many  times  must  each  constraint  path  appear  In  one  cycle  of 
the  total  control  structure.  (2)  How  should  the  constraints  paths  be  combined  Into 
one  cycle. 

3.3.1:  Determining  the  Relative  Frequency  of  Constraint  Paths 

1 

Ihe  first  step  In  the  algorithm  Is  to  determine  how  many  times  each  constraint 
[ i appears  In  one  cycle  of  the  total  solution.  Upper  and  lower  bounds  can  be  derived 

from  the  length  o’  the  cycle  and  the  basic  latency  specification.  Consider  the 
lower  bound  on  the  number  of  appearances  of  constraint  /:  let  Kf  be  the  number  of 

appearances  of  C(  In  one  cycle  of  the  solution  let  - |C^|  and  c - |»|. 

Since  the  latency  specification  for  Cf  requires  to  appear  at  least  once  every 
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If  time  units: 

k,  * r-5—  0-7) 

' V"/ 

This  leaves  c (tho  length  of  tiro  cycle)  to  ho  determined.  However,  if  Cf  appears 
kf  times: 

cikjWi  (3-0) 

Mora  precisely,  the  algorithm  starts  with  tho  assumption  that  onch  block  ami 

constraint  appears  once  and  that  c - It.  this  approximation  Is  used  to  derive  k. 

A * ' 

for  all  constraints  In  the  schema.  If  any  kf  Increases,  this  Is  used  to  update  the 

minimum  number  of  times  each  block  in  the  constraint  must  appear,  which  In  turn 
may  cause  c to  Increase.  This  process  continues  until  all  kf  are  consistent  with  c. 

In  practice,  this  only  takes  a few  Iterations. 

Theorem  3-4  places  an  upper  bound  on  the  number  of  blocks  In  a cycle,  blit  this 
bound  Is  not  directly  applicable  to  the  branch  and  bound  algorithm  since  the 
branch-and-bound  algorithm  does  not  try  all  cycles  of  a given  length  An  upper 
bound  on  the  number  of  appearances  of  any  constraint  can  be  easily  derived  If  the 
number  of  appearances  of  the  other  constraints  Is  held  constant. 

Drat,  an  upper  hound  on  the  length  of  a cycle  can  be  derived  by  applying 
equation  3*7  to  all  constraints  except  constraint  I.  I hen  the  minimum  weight  of  a 
cycle  containing  k^  appearances  of  C^.  can  be  computed  for  all  i * J.  letting  i:^ 

ba  the  maximum  allowed  cycle  weight  and  c be  the  minimum  cycle  weight  (not 
Including  constraint  /),  the  minimum  weight  of  a cycle  containing  kf  appearances  of 
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Cy  is: 


C ♦ Wy 


(3-9) 


Therefore,  the  upper  bound  on  can  be  derived  by  restricting  the  resultant  cycle 
weight  to  be  less  than  c 


max 


1 W, 


(3-10) 


’I 


This  Ignores  the  possibility  of  blocks  In  already  appearing  In  the  cycle  as  part 

of  other  constraints.  However,  Including  more  appearances  of  constraint  / will 
eventually  cause  the  minimum  cycle  length  to  exceed  c 


max 


This  still  does  not  bound  the  number  of  appearances  for  all  constraints,  since 
constraint  I can  appear  more  often  if  constraint  j appears  more  often,  etc.  Placing 
an  arbitrary  bound  on  one  constraint  will  also  bound  the  number  of  appearances  of 
all  other  constraints.  For  example,  requiring  at  least  one  constraint  to  appear  only 
once  places  a fairly  tight  bound  on  all  constraint.  However,  It  Is  not  true  that  a 
solution  of  this  type  always  exists.  An  example  is  shown  In  figure  3-3. 


3.3.2:  Strategies  for  Combining  Solutions 

Once  the  number  of  appearances  per  cycles  of  each  constraint  path  is  known, 
the  constraint  paths  can  be  permuted  to  form  a control  structure  which  satisfies  all 
the  real-time  specifications.  Many  of  the  techniques  for  Improving  the  efficiency  of 
'branch-and-bound*  optimization  algorithms  can  be  applied  to  this  problem  even 
though  It  is  not  an  optimization  problem.  An  optimization  problem  seeks  a 
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*4  “ - *c  - <0  - V V - 1 

'a>0  s 11 

•a.y  * 1b 

lb,J  * 7 

'c  ,J  * 10 

Control  Structure:  (ABFDFCBFADEBFCF  )* 

Block  Diagram  Where  All  Constraints  Appear  More  Than  Once 

Figure  3*3 

permutation  of  n objects  that  maximizes  an  evaluation  function  f of  the 
permutation. 

A ’branch-and-bound*  algorithm  for  this  problem  generates  permutations  for  a 
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subset  of  the  objects  and  extends  these  permutations  to  larger  subsets.  The 
permutations  to  the  subsets  are  called  part'al  solutions,  and  are  arranged  in  a tree. 
Nodes  In  the  tree  correspond  to  partial  solutions  and  the  descendants  of  a node 
are  the  extensions  of  that  partial  solution.  Branch-and-bound  algorithms  are  often 
more  efficient  than  direct  enumeration  since  It  is  often  unnecessary  to  examine  the 
entire  search  tree.  The  Key  to  pruning  the  search  tree  Is  the  dominance  relation 
on  nodes  of  the  tree.  The  evaluation  function  f can  be  extended  to  arbitrary 
nodes  of  the  search  tree  by  defining  the  value  of  a non-terminal  node  to  be  the 
maximum  value  of  its  descendants.  Then  node  A dominates  node  B if  and  only  if 
f(4)>f(B).  The  branch-and-bound  algorithm  may  prune  any  subtree  whose  root 
node  is  dominated  by  some  node  of  the  tree  that  has  already  been  explored. 

In  general,  the  dominance  relation  for  a particular  optimization  problem  cannot  be 
computed  without  examining  the  entire  tree.  However,  It  Is  often  easy  to  compute 
some  weaker  relation.  These  weaker  relations  are  usually  referred  to  as 
dominance  relations  in  the  literature,  so  we  will  use  the  term  strong  dominance 
relation  to  refer  to  the  dominance  relation  that  relates  A to  B If  and  only  If 
f(A)  > f (B). 

Branch-and-bound  algorithm  vary  In  the  order  the  tree  is  searched  end  how  the 
dominance  relations  used  to  prune  the  search  tree.  Kohler  and  Stelglitz  classified 
branch-and-bound  algorithms  and  Initiated  the  theoretical  study  of  dominance 
relations  [13].  They  demonstrated  the  surprising  result  that  pruning  based  on  a 
stronger  dominance  relation  does  not  always  Improve  the  efficiency  of  the  algorithm. 
However,  Ibarakf  showed  that  stronger  dominance  relations  do  lead  to  more  efficient 
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algorithms  for  several  common  classes  of  branch-and-bour.d  algorithms  [10]. 

Branch-and-bound  algorithm  as  doflned  by  Kohler  and  Stelglltz  also  make  use  of  a 
function  g that  places  a upper  bound  on  the  value  of  f at  each  node.  If  L is  the 
maximum  f(A)  for  leaf  nodes  A encountered,  pruning  sub-trees  with  g(A)*f.  can 
only  Improve  the  efficiency  of  the  algorithm.  However,  the  upper  bound  function 
can  also  be  viewed  as  a particular  dominance  relation. 

The  control  structure  problem  as  stated  Is  not  an  optimization  problem.  However, 
It  Is  still  possible  to  define  a dominance  relation  between  nodes  of  the  search  tree: 
node  A strongly  dominates  node  6 unless  B leads  to  a valid  control  structure  and  A 
does  not.  Assuming  the  nodes  at  each  level  are  generated  In  a random 
(lexicographic)  order,  the  best  pruning  for  the  algorithm  to  use  is  to  retain  the  node 
at  each  level  which  dominates  the  other  nodes.  If  this  dominance  relation  can  be 
easily  computed,  the  algorithm  can  generate  a valid  control  structure  without 
backtracking. 

As  a first  step  towards  computing  a dominance  relation,  define  the  slack  for  each 
constraint  to  be  the  difference  between  the  latency  requirement  and  the  latency 
actually  achieved  by  the  control  structure.  The  constraint  with  the  least  slack  Is 
the  most  critical  constraint  (MCC).  The  slack  In  the  MCC  could  also  be  used  as  a 
value  function  to  be  maximized.  If  no  control  structure  satisfies  the  real-time 
constraints,  the  control  structure  maximizing  the  slack  In  the  MCC  Is  probably  a 
good  'close*  solution.  Also,  the  slacks  may  be  used  to  evaluate  any  heuristic 
algorithms  for  deriving  control  structures. 

The  latency  achieved  by  a static  control  structure  for  a constraint  Is  the 

weight  of  the  most  critical  window  for  C..  Adding  a block  to  the  cycle  of  the 
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control  structure  cannot  increase  any  slacks  since  the  weight  of  some  critical 
window  will  be  increased.  The  only  exception  would  be  If  the  new  block  completes 
an  additional  occurrence  of  some  constraint  path,  thereby  creating  new  critical 
windows.  This  cannot  happen  If  the  blocks  being  added  are  elements  of  some 
other  constraint  path,  since  no  constraint  path  is  contained  in  another  constraint 
path.  Therefore,  the  MCC  slack  can  be  used  as  an  upper  bound  function  in  a 
branch-and-bound  algorithm  to  maximize  the  MCC  slack.  Upper  bound  functions  are 
also  often  used  to  guide  the  search  in  branch-and-bound  algorithms.  For  example, 
the  algorithm  could  always  expand  the  node  with  the  greatest  upper  bound. 

If  the  slacks  In  each  constraint  are  reduced  by  the  same  amount  when  a new 
block  is  added  to  the  cycle,  then  the  partial  solution  with  the  greatest  MCC  slack 
would  be  a dominant  solution.  Unfortunately,  this  is  not  generally  the  case. 

Consider  dividing  a cycle  w of  the  control  structure  w"  Into  regions  f . , and  (.  , , as 

shown  In  figure  3-4.  The  f y y regions  contain  one  occurrence  of  Cy,  but  [f] 

contains  no  occurrence  of  Cy.  The  critical  windows  of  Cy  are  ^y  y^y  y^y  y +v 

Therefore,  adding  blocks  to  a {y  y region  Increases  the  weight  of  f y y , and  adding 

blocks  to  a 4y  y region  Increase  the  weight  of  *yy_i  and  fy  j.  Even  If  |#y  y | 

Increases,  the  slack  for  C.  will  not  decrease  unless  If.  . | - max  If.  . I.  The  slacks 

' W ^ llj 

can  not  be  used  to  compute  a dominance  relation  since  the  interdependence  of 
constraint  paths  may  force  new  blocks  to  be  added  within  the  most  critical  window 
of  some  constraint,  while  another  solution  with  a smaller  MCC  slack  might  have  a 
critical  window  of  the  right  size  In  the  right  place. 
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Regions  of  a Critical  Window 
Figure  3*4 

Keeping  vectors  of  slacks  for  each  constraint  path  does  not  correct  the  problem. 
Consider  the  example  shown  of  figure  3-3  with  the  latency  specification  as  shown 

In  figure  3-6.  It  can  be  easily  verified  that  (ADEFCADBC  )*  is  a feasible  control 
structure  for  this  schema.  It  is  also  the  only  feasible  control  structure1.  40  and 
CF  must  appear  at  least  twice  in  one  cycle  of  the  solution.  Figure  3-5  shows 
slacks  for  this  constraints  for  two  partial  control  structures.  The  merging  of 

(4040)"  and  (CFCF)*  that  leads  to  the  solution  is  (ADFCADFC)" . However,  the 
slacks  for  CF  in  (ADCFADCF )"  are  larger  and  the  slacks  for  40  are  the  same,  so 

( ADCFADCF )"  would  dominate  (ADFCADFC  )*  even  though  It  doesn’t  lead  to  a 
solution. 


3.3.3:  Performance  of  the  Algorithm 

Assume  each  constraint  path  contains  an  average  of  k blocks.  The  slack  of  a 

constraint  path  in  a trial  cyclic  solution  can  be  determined  In  at  most  k scans  of 

the  cycle.  If  there  are  n constraint  paths  there  will  be  o(nk)  scans  of  each  trial 

solution  generated  by  the  algorithm.  The  trial  cycles  wilt  be  o(nk)  blocks  long  (this 

1.  This  was  verified  by  checking  all  cyclic  control  structures  that  might  be 
generated  by  a branch-and-bound  algorithm  assuming  that  the  least  critical 
constraint  only  appears  once  per  cycle. 
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Counter-Example  to  Slack  as  a Dominance  Relation 
Figure  3-6 


Ignores  the  possibility  of  a constraint  appearing  several  times  in  one  cycle).  The 

overall  time  complexity  of  the  algorithm  will  be  o (/>2A2^  times  the  number  of  trial 
cycles  generated  per  problem. 

Assume  the  trial  cycle  contains  m 1 blocks  and  the  next  constraint  path  contains 
n>2  blocks.  There  are  (m  ^ *m  1 )!  cycles  containing  all  the  blocks,  but  we  are 
only  Interested  In  one  of  the  m 1 ! permutations  of  the  blocks  in  the  old  cycle,  and 
permutations  of  the  blocks  In  the  new  constraint  (l.e.  we  must  consider 
m 1 different  phase  relations  of  the  two  cycles).  Therefore,  the  number  of  different 
trail  cycles  generated  at  this  step  Is: 


(m  1 +/r»2-1 )!  [m  2~^\ 

n^K/ng-l)!  * J 


(3-11) 


Of  course,  if  some  blocks  of  the  new  constraint  are  already  contained  In  the  old 
cycle,  or  If  the  next  constraint  appears  more  than  once,  not  all  of  the  generated 
cycles  will  be  distinct.  However,  It  Is  rather  difficult  to  avoid  generating  these 
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cycles.  There  will  be  relatively  little  extra  cost  to  the  algorithm  as  long  as  It  does 
not  Investigate  cycles  that  are  identical  to  cycles  that  have  already  lead  to 
failures.  Therefore,  the  number  of  trial  cycles  generated  by  the  merging  algorithm 
when  it  finds  a solution  without  backtracking  Is  approximately: 


/- 2 v ' 


(3-12) 


Equation  (3-12)  Is  o (An*+1)  since  the  binomial  term  in  the  sum  is  o(n*)  and  there 
are  n terms. 

If  the  merging  algorithm  fails  to  find  a solution,  then  it  must  have  backtracked 
through  each  trial  solution  and  the  total  number  of  cycles  generated  is: 


(i+*(2vi)(i+  • • ■ d^ev))  • ■ ' > 


which  can  be  approximated: 


1-2  v * ’ 


(3-13) 


(3-14) 


Equation  (3-14)  Is  o((An*)n^  or  o(Ann*n),  and  is  exponential  In  the  number  of 
blocks  In  the  schema.  This  is  a very  loose  upper  bound  and  would  only  be 
achieved  If  all  generated  solutions  were  plausible  except  when  the  last  constraint 
was  being  merged  In.  However,  this  bound  Is  achievable  if  the  first  n-1  constraint 
paths  had  relatively  large  latency  specifications  while  the  last  constraint  path  had 
relatively  small  latency  specifications.  This  situation  can  be  easily  avoided  by 
starting  with  the  path  with  the  smallest  latency  constraints  relative  to  the  weight 
of  the  path. 
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3.3.4:  Speeding  up  the  Algorithm 

1 here  are  many  ways  the  average  performance  of  the  algorithm  could  be 
Improved.  For  example,  If  we  had  a tighter  lower  bound  on  the  slack  In  the  MCC, 
we  could  prune  more  subtrees.  We  can  get  a tighter  bound  by  determining  what 
new  blocks  must  be  added  to  the  control  structure.  Adding  a new  block  always 
Increases  the  sl2e  of  some  critical  window  tor  a constialnt  by  at  least  the  weight 
of  the  block.  Therefore,  If  the  sum  of  the  slacks  for  a constraint  is  less  than  the 
total  weight  of  blocks  that  must  be  added  to  the  control  structure,  at  least  one  of 
the  critical  windows  for  that  path  will  exceed  the  latency  specification  for  that 
path.  This  tighter  bound  has  no  effect  on  the  performance  If  no  backtracking  Is 
necessary.  However,  If  no  solution  is  found,  using  the  tighter  bound  Is  roughly 
equivalent  to  reducing  n,  since  fewer  constraints  need  to  be  combined  before  the 
control  structure  Is  recognized  as  Infeasible. 

Notice  that  the  performance  of  the  algorithm  would  not  be  of  polynomial 

complexity  even  if  there  were  a dominance  relation  that  totally  ordered  the 

possibilities  at  each  level.  The  problem  Is  that  the  number  of  partial  solutions  that 

must  be  generated  by  a naive  algorithm  can  grow  exponentially  with  the  complexity 

of  the  schema.  Therefore,  finding  a good  dominance  relation  Is  not  as  important  as 

finding  a search  function  that  generates  nodes  that  aro  most  likely  to  lead  to  a 
« 

solution  first. 

Since  the  weight  of  the  critical  windows  Increase  when  new  blocks  are  added, 
we  might  try  merging  in  new  constraint  paths  sc  that  no  new  blocks  are  added 
before  trying  more  general  mergings.  This  wilt  improve  the  performance  If  the 
solution  is  an  extension  of  this  type  of  merging,  even  If  the  algorithm  must 
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backtrack  since  fewer  nodes  are  generated  on  that  level.  If  the  algorithm  must 
backtrack  through  All  the  control  structures  of  this  type,  the  performance  of  the 
algorithm  Is  somewhat  worse.  The  effect  of  this  heuristic  may  be  approximated  by 
reducing  A,  since  the  length  of  the  strings  merged  Into  the  current  control  structure 
will  be  reduced. 

The  other  way  of  improving  the  performance  of  the  algorithm  is  to  reduce  the 
complexity  of  the  problem.  This  can  be  done  by  replacing  sub-graphs  of  the  block 
diagram  module  with  new  blocks.  Whenever  the  new  block  is  fired,  the  blocks 
comprising  the  subgraph  replaced  by  the  new  block  are  fired  in  some  fixed  order. 
This  replacement  can  dramatically  reduce  A,  and  would  improve  both  the  best-  and 
worst-case  performance.  However,  combining  blocks  in  this  way  can  result  In  a 
schema  which  has  no  feasible  control  structures  even  though  the  original  schema 
does. 

Since  the  process  of  generating  a control  structure  can  be  so  time  consuming.  It 
would  be  extremely  useful  to  quickly  identify  real-time  specifications  that  are 
Impossible  to  satisfy.  One  way  of  doing  this  is  to  compute  the  percentage  of  CPU 
time  required  by  each  block.  If  the  sum  of  this  percentage  over  all  blocks  in  the 
schema  Is  greater  than  100%,  the  latency  specifications  are  obviously  unsatlsflable. 

The  percentage  of  the  CPU  required  by  each  block  is  easily  computed:  each 
constraint  must  be  executed  at  least  once  every  l,-|C^|+«  time  units. 

Therefore,  each  block  c^  j in  Cf  must  be  executed  at  least  once  every  1^  - 1 |+« 
time  units  and  its  corresponding  CPU  percentage  is: 
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If  an  block  appears  In  several  constraints,  its  CPU  percentage  Is  the  maximum  of 
the  percentage  implied  by  each  constraint  the  block  appears  in.  Using  the 
maximum  rather  than  the  sum  corresponds  to  assuming  that  each  time  the  block  is 
fired  It  will  help  satisfy  all  the  constraints  it  appears  in.  Although  this  is  not 
necessarily  the  case,  it  is  a lower  bound  on  the  CPU  usage. 

Another  quick  tost  for  unsatisflable  latency  specifications  is  that  the  slack  in 
each  latency  specification  must  be  larger  than  the  computation  time  for  all  blocks 
not  contained  In  that  constraint  path.  Otherwise,  the  t portion  of  some  critical 
window  for  that  constraint  will  be  too  large  (refer  to  figure  3-4). 


3.3.5:  Practical  Experience 

A branch-and-bound  algorithm  similar  to  the  one  described  above  has  been 
Implemented  as  part  of  a system  for  implementing  continuous-time  block  diagrams  on 
conventional  micro-processors.  The  Implementation  runs  on  a PDP-11/70  under  the 
UNIX  timesharing  system.  The  block  diagram  is  described  using  an  interactive 
graphics  editor  developed  by  John  Pershing  [18]-  The  branch-and-bound  algorithm 
Is  only  responsible  for  choosing  the  order  to  execute  the  blocks.  The  object  code 
for  the  block  diagram  Is  produced  by  a separate  program. 

The  program  uses  all  of  the  heuristics  mentioned  above  except  It  does  not 
combine  sub-graphs  Into  new  blocks.  The  program  is  able  to  find  control  structures 
to  satisfy  most  latency  specifications  for  small  block  diagrams  using  less  than  a 
minute  of  CPU  time.  So  far,  only  one  set  of  latency  constraints  has  been  found 


-60- 


Practical  Experience  Section  3.3.5 


where  a valid  control  structure  oxists  but  no  control  structure  was  found  by  the 
program  (see  figure  3-3).  Some  latency  specifications  require  more  time  to  find  a 
valid  control  structure. 

In  the  absence  of  a fast  optimal  algorithm,  it  Is  preferable  to  have  a fast 
algorithm  which  yields  'good'  control  structures  quickly.  Heuristic  algorithms  are 
generally  evaluated  one  of  two  ways:  one  approach  chooses  a fixed  algorithm  and 
derives  an  upper  (or  lower)  bound  on  how  far  the  algorithm’s  solution  is  from  the 
optimal  solution.  For  example,  Graham's  algorithm  for  scheduling  independent  tasks 
on  multiple  processors  executes  tasks  which  require  more  processing  time  first. 
The  resulting  schedule  is  no  more  than  4/3  times  as  long  as  the  optimal  schedule 
[8]. 

The  other  approach  develops  a family  of  algorithms  each  requiring  polynomial 
time.  As  the  degree  of  the  polynomial  increases,  the  solutions  found  by  the 
programs  are  closer  to  optimal.  The  family  of  algorithms  :s  monotonic  in  the  sense 
that  the  an  algorithm  taking  more  time  never  produces  a poorer  solution  than  one 
taking  less  time.  If  the  degree  of  the  polynomial  were  increased  to  infinity  the 
algorithm  would  be  optimal.  However,  it  would  also  no  longer  be  polynomially  time 
bounded.  An  example  Is  a series  of  scheduling  algorithms  employing  limited 
lookahead  [1], 

The  second  approach  does  not  seem  applicable  to  the  control  structure  problem. 
Limiting  the  breadth  of  back-tracking  yields  a family  of  exponential  time  algorithms 
with  the  exponent  increasing  with  the  amount  of  back-tracking.  A family  of 
polynomial  algorithms  would  result  if  at  most  A blocks  were  merged  at  a time  with 
no  backtracking.  However,  these  algorithms  are  very  unsatisfactory  If  any 
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constraint  must  appear  more  than  once.  If  the  number  of  blocks  In  the  constraint 
path  Is  less  than  k,  than  all  blocks  for  the  second  (and  subsequent)  appearance  of 
the  constraint  will  be  merged  coincident  with  the  existing  occurrences  of  those 
blocks.  If  k is  increased  so  this  does  not  happen,  the  performance  of  the  algorithm 
is  only  slightly  better  than  the  complete  algorithm  with  no  backtracking. 


3.4:  Heuristics  for  Generating  Control  Structures 

Steve  Ward  has  experimented  with  some  quick,  simple  heuristics  for  generating 
static  control  structures.  Basically,  the  heuristic  constructs  control  structures  of 

ft 

the  form  (aflayat  ■ ■ • ) where  a Is  the  most  critical  constraint  path  and  fi,  y,  t,  et 
cetera  are  taken  from  the  other  constraint  paths.  More  specifically,  blocks  from 
the  next  most  critical  constraint  are  added  to  P with  the  restriction  that  hM  is 
less  than  1^.  If  more  blocks  remain  In  the  constraint  they  are  added  to  y so  that 

| | is  less  than  l#.  Once  all  constraints  have  been  merged  In  this  way,  the 

latency  specifications  are  checked.  If  they  are  all  satisfied  then  the  generated 
string  Is  a feasible  control  structure. 

The  heuristic  will  also  call  Itself  using  the  current  solution  as  « so  the  generated 
solution  may  also  be  of  the  form: 

((«£«?  • • • )t(apay  ••■)••••)" 

Since  these  heuristics  construct  a control  structure  rather  than  search  for  one, 
they  run  very  quickly.  However,  they  also  do  not  find  solutions  to  a fairly  large 
number  of  latency  specifications,  even  for  simple  block  diagrams.  Still  these 
heuristics  are  more  attractive  as  a basis  for  an  approximate  algorithm,  not  only 
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because  of  their  speed  but  also  these  heuristics  could  be  extended  to  handle 
particular  styles  of  block  diagrams  as  the  process  of  constructing  control 
structures  becomes  better  understood. 


I 


4s  Static  Priority  Interrupt  Control  Structures 

In  some  applications,  tho  tokens  at  the  Input  links  do  not  change  continuously.  If 
the  control  structure  can  detect  whi  n an  input  changes,  the  roal-time  performance 
can  be  Improved.  Intuitively,  this  Is  possible  since  if  no  inputs  to  a block  have 
changed,  that  block  does  not  neod  to  bo  executed.  On  the  average,  this  type  of 
control  structure  ought  to  do  less  computation  and  therefore  ought  to  have  better 
real-time  performance.  On  tho  other  hand,  better  average  performance  does  not 
guarantee  better  worst-case  performance  and  specific  questions  of  performance 
must  be  answered  with  respect  to  a particular  model. 

Although  the  prototypical  example  of  a dynamic  control  structure  is  interrupt 
driven,  It  Is  important  to  realize  that  hardware  Interrupts  are  not  necessary.  For 
example,  a control  structure  could  sample  the  Inputs  until  one  or  more  inputs 
change.  After  all  the  computation  initiated  as  a result  of  these  changes  had 
completed,  the  control  structure  would  continue  to  sample  the  Inputs.  In  general, 
such  a scheme  would  risk  missing  changes  In  the  Inputs.  However,  the  control 
structure  can  use  the  real-time  specifications  to  guarantee  this  will  not  happen. 

4.1:  Dynamic  Control  Structuies 

Many  of  the  strategies  for  scheduling  Independent  tasks  to  satisfy  real-time 

constraints  mentioned  In  tho  previous  chapter  use  dynamic  control  structures.  For 

! I 

example,  Liu  and  Layland  use  static  priority  Interrupts  and  consider  the  case  (in  our 
terms)  where  the  latency  Is  equal  to  the  period  between  requests  [14],  They 
consider  the  earliest  deadline  scheduler  only  In  this  context  although  the  earliest 
deadline  schedule  Is  optimal  for  any  sequence  of  requests  and  deadlines,  as 
mentioned  earlier. 

Given  an  optimal  scheduler.  Is  there  any  reason  to  consider  a suboptimal 
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scheduler?  The  answer  will  be  yes  if  a good  suboptima!  scheduler  exists  which 
uses  less  resources  than  the  optimal  scheduler.  The  earliest  deadline  scheduler 
needs  to  find  the  highest  priority  task  to  execute  whenever  a task  completes 
(alternately,  It  needs  to  insert  requests  into  the  proper  position  in  a task  queue). 
A static  priority  Interrupt  control  structure  also  needs  to  find  the  highest  priority 
task  to  execute.  However,  this  is  done  In  hardware  by  many  existing  computers, 
Including  cu  rent  microcomputers.  Also,  the  earliest  deadline  scheduler  requires  a 
real-time  clock  to  compute  the  deadlines  for  each  task  from  the  request  time  and 
the  latency  specification.  Therefore,  static  Interrupt  control  structures  are 
sufficiently  simpler  than  a earliest  deadline  control  structure  to  deserve  further 
consideration. 

4.2:  Model  for  Static  Interrupt  Control  Structures 

A static  Interrupt  control  structure  associates  a task  with  each  block  In  the 
diagram.  The  tasks  are  related  by  a precedence  relation  consistent  with  the  block 
diagram.  Each  task  has  a priority  and  may  be  Idle,  active,  or  requested.  The 
priority  may  be  thought  of  as  an  Integer  with  numerically  greater  priorities  being 
better. 

When  an  Input  changes,  all  tasks  whose  blocks  are  watchers  of  that  Input 
become  requested.  The  control  structures  chooses  the  task  with  the  highest 
priority  among  the  requested  tasks.  This  task  Is  active  until  the  block  complete 
executing  when  all  Its  successor  tasks  become  requested  and  the  task  Itself 
becomes  Idle.  If  the  control  structure  allows  active  tasks  to  be  suspended  while 


another  task  Is  executed  the  control  structure  is  call  preemptive.  Otherwise  It  Is 
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non- preemptive.  Unless  otherwise  noted  control  structures  are  assumed  to  be 
preemptive. 

The  latency  performance  of  any  static  Interrupt  control  structure  can  be 
determined  for  each  task  by  adding  the  computation  time  for  that  task  to  the 
maximum  computation  time  used  by  higher  priority  tasks  while  the  task  is  on  the 
ready  queue.  The  difficulty  In  this  analysis  Is  in  determining  how  much  computation 
might  be  used  by  other  tasks. 

The  simplest  case  to  consider  Is  when  all  the  tasks  are  Independent  (each  task 
consists  of  exactly  one  block).  Each  task  / requires  ty  units  of  computation;  and 

has  priority  Py , latency  ly , and  bandwidth  By . Without  loss  of  generality,  the  tasks 

can  be  numbered  so  that: 

■ • • 

The  overhead  of  associated  with  interrupts,  selecting  a task  for  execution,  etc. 
will  be  Ignored  for  the  time  being.  We  shall  also  assume  that  all  priorities  are 
distinct. 

The  latency  for  task  / when  Its  inputs  change  discretely  Is  simply  the  maximum 
elapsed  time  between  a change  In  an  input  and  the  termination  of  the  task.  This 
must  be  less  than  ly  If  the  latency  specification  for  task  / Is  satisfied.  The 

Interpretation  of  the  bandwidth  specification  Is  also  simplified.  Instead  of 
specifying  a minimum  rate  for  sampling  inputs,  the  bandwidth  specifies  the  maximum 
rate  at  which  an  Input  changes. 

The  latency  specification  for  task  i will  be  satisfied  If  and  only  If  the  block  for 
task  / can  be  completely  executed  during  any  time  Interval  of  duration  ly.  During 
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this  interval,  tasks  with  priority  better  than  p#.  will  also  be  run,  and  the  amount  of 
CPU  time  used  by  higher  priority  tasks  must  be  less  than  ly  - ty . 

Notice  that  this  model  is  equivalent  to  the  model  used  by  Fiala.  Fiala’s  P. 
corresponds  to  ty,  Df  corresponds  to  ly,  and  corresponds  to  1/By.  Therefore, 
for  a single  processor  we  have  the  obvious  restrictions: 


and: 


I Byty  £ 1 


(4-1) 


/-I 


(4-2) 


The  summands  in  (4-2)  are  the  fraction  of  CPU  time  used  by  task  /'.  Obviously  the 
total  fraction  of  the  CPU  used  by  all  the  tasks  must  be  less  than  one.  Equation 
(4-1)  can  be  derived  from  (4-2). 

Lemma  4-1:  The  amount  of  CPU  time  used  by  n independent  tasks  using  a static 
priority  scheduler  in  a window  of  duration  At  does  not  depend  on  the 
relative  priority  of  the  tasks. 

Proof:  The  processor  is  always  busy  If  some  task  is  requesting  service. 
Changing  the  priorities  of  the  tasks  will  never  cause  the  processor  to 
remain  idle  when  some  task  requests  service,  nor  will  it  affect  when  the 
tasks  request  service. 

Since  the  control  structure  only  executes  a task  If  some  Input  to  the  task 
changes,  task  I cannot  be  executed  more  often  than  once  every  1 /By  time  units. 

Clearly,  a task  uses  the  maximum  CPU  time  if  any  Interval  if  It  requests  service  at 
this  maximum  rate. 

Assume  task  / requests  service  at  times  0,  1/By,  2/By,  • • • , and  let  Cy(t)  be 
the  maximum  amount  of  CPU  time  used  by  task  I in  the  Interval  (0,  t).  The  highest 
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priority  task  (task  1 ) always  starts  executing  immediately  after  It  requests  service 
and  executes  for  tj  time  units,  so  it  will  be  executed  [B^tJ  complete  times  in  the 

interval.  Let  r-t-|B^tJ  be  the  amount  of  time  at  the  end  of  the  window  after 

the  last  request  for  task  1.  Task  1 will  be  executing  during  the  interval  (t-r,  t) 
since  task  1 has  the  highest  priority.  However,  If  r>tj,  only  tj  units  of 


computation  will  be  used  so: 


I B.t  I ) 

1 (,)“ 


The  maximum  amount  of  CPU  time  used  by  task  1 in  the  interval  (At,  t+At)  is: 

C 1 (t  +At ) ~ C 1 (At ) (4-4) 

We  will  show  that  this  is  maximized  when  At  - 0 by  showing: 

C1(t-»At)-C1(At)sC1(t) 

or 

C1(t«At)-C1(t)$C1(At)  (4-6) 

Since  the  requests  for  task  1 occur  with  a regular  period,  C^(t)  Is  also  periodic. 

In  fact: 

C1(t  + 1/B1)-C1(t)*t1  (4-0) 

Therefore,  we  need  only  consider  At  between  0 and  1/B^,  In  which  case: 

C1(At)-/n/n(t1,  At)  (4-7) 

This  Is  the  maximum  amount  of  CPU  time  used  by  any  Interval  of  duration  At 
since  the  CPU  time  used  cannot  be  greater  than  the  duration  of  the  Interval  nor 
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can  It  be  greater  than  tj  If  the  Interval  contains  less  than  one  period.  Therefore, 

the  inequality  in  (4-5)  holds  since  the  left  hand  side  is  the  amount  of  CPU  time 
used  in  an  Interval  of  duration  At  starting  at  t. 

The  worst  case  for  a set  of  tasks  will  occur  when  all  tasks  request  service  at 
time  0 and  continue  requesting  service  at  their  respective  maximum  rates.  This  is 
true  since  the  highest  priority  task  will  use  Its  maximum  amount  of  CPU  time  under 
these  conditions,  and  by  lemma  4-1,  any  task  can  be  made  the  highest  priority  task 
without  affecting  the  amount  of  CPU  time  used  by  the  set  of  tasks. 

Define  Cf(t)  by: 

Cj(t)  - [Byf  j t i+min 

The  amount  of  CPU  time  used  by  tasks  j and  A Is  not  necessarily  C^(f)  summed 

over  J and  A.  The  difficulty  is  that  if  requests  for  tasks  j and  A occur  sufficiently 
near  the  end  of  the  window  and  of  each  other  then  only  the  higher  priority  task  will 
actually  be  executed.  Therefore,  It  is  necessary  to  determine  a precise  schedule 
for  the  Interval  from  0 to  t.  However,  if  we  are  only  interested  In  how  much  CPU 
time  Is  used  in  this  Interval,  lemma  4-1  assures  us  that  we  may  assign  arbitrary 
priorities  to  tasks  } and  A. 

However,  a sufficient  condition  for  satisfying  the  latency  specification  for  task  I 
Is: 

I- 1 

I,  it ,+  (4-8) 

This  equation  can  be  made  more  Intuitive  If  the  time  required  by  task  ] is 


V *-■ 


M 


approximated  by: 
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Then  equation  (4-2)  becomes: 


This  can  be  rewritten  as: 


I,  * 


l/ 


/ * /-I 

1-  2 B/t/ 

y-i  1 1 
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(4-9) 


(4-10) 


(4-11) 


The  denominator  In  equation  4-11  represents  the  fraction  of  CPU  time  available  to 
task  i.  The  effect  of  higher  priority  tasks  is  equivalent  to  reducing  the  CPU  speed. 


4.3:  Assigning  Priorities  to  Independent  Tasks 

One  of  the  weaknesses  of  traditional  real-time  operating  systems  based  on 
static  priority  scheduling  Is  that  the  system  does  not  verify  that  the  priorities 
assigned  by  the  user  are  consistent  with  his  real-time  specifications.  Even  if  the 
system  checked  these  specifications,  the  user  still  must  assign  priorities,  which  do 
not  have  a simple  relation  to  the  real-time  specifications.  The  obvious  strategy  of 
assigning  the  highest  priority  to  the  task  that  requires  the  fastest  response  time 
does  not  work.  Consider  the  example  in  figure  4-1.  Either  task  1 or  task  2 can 
run  at  the  best  priority  since  If  - 1/1^,  then  p.,  > p2  and  the  the  latency 

for  task  2 is: 


-eo- 


l 


Assigning  Priorities  to  Independent  Teaks 


Section  4.3 


V l,2Bi]V,n",[tr 

-12+  j 2 ♦ min  ^2, 

- 12  ♦ 8 ♦ /n/n(2, 0) 
-20^l2-  16 


However,  the  latency  for  task  1 If  p2  > p^  Is: 


I 

t 

•i**n 
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t2. «!— 

2 

J 

I” 

* mini  12.  1 6 - 1 

16 
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- 14s  ^ - 16 


V2 

Bl"  4 

lt  - 16 

t2-  12 

l1  - 16 

Counter-example  to  priority  * 1 / latency 
Figure  4-1 

The  algorithm  successively  finds  a task  that  can  satisfy  its  latency 
specifications  while  assigned  the  lowest  priority.  If  there  are  several  such  tasks, 
choose  one  arbitrarily.  This  task  Is  assigned  the  lowest  priority  and  removed  from 
the  set  of  tasks.  The  next  task  selected  will  be  assigned  a priority  higher  than  all 
previously  assigned  priorities  but  lower  than  all  tasks  still  unasslgned.  This 
continues  until  no  task  remains  or  no  task  can  be  found  that  can  execute  at  a 
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priority  lower  than  all  other  tasks.  In  this  case,  no  assignment  of  static  priorities 
will  satisfy  all  the  latency  specifications  using  only  one  processor.  This  algorithm 
will  never  make  a bad  choice.  Consider  the  situation  when  one  or  more  tasks 
remain  yet  no  task  can  be  assigned  the  lowest  priority.  Any  task  that  could 
possibly  run  at  a lower  priority  has  already  been  assigned  a lower  priority. 

4.4:  More  Complex  Models 

The  model  for  static  interrupt  control  structures  made  several  simplifying 
assumptions,  such  as  ignoring  scheduling  overhead,  assuming  preemptive  scheduling 
and  distinct  priorities.  The  model  can  be  easily  changed  to  account  for  different 
assumptions. 

4.4.1:  Scheduling  Overhead 

When  a task  requests  service,  the  control  structure  must  compare  the  priority  of 
the  task  with  the  priority  of  the  currently  executing  task.  If  the  priority  of  the 
current  task  Is  higher,  then  new  request  must  be  queued  In  some  manner.  When 
any  task  completes  execution,  the  control  structure  must  select  a new  task  to 
execute.  Also,  switching  the  processor  between  tasks  will  generally  Involve 
setting  up  some  processor  registers.  However,  all  of  these  actions  will  occur  for 
every  Instance  of  a task  requesting  service,  so  these  overhead  costs  can  be 
Included  in  the  maximum  CPU  time  used  by  task  / « t^.  The  basic  algorithm  of 

finding  a task  which  can  be  assigned  the  worse  priority  while  still  satisfying  (4-0) 
Is  still  correct. 
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I.  i t,  + I C , (I • ) + max  (t)  (4-12) 

/-i  * y-/+i  y 

Again,  the  assignment  algorithm  does  not  require  any  changes.  This  is  obvious  if 
the  algorithm  finds  a valid  assignment  of  priorities.  Increasing  the  priority  of  some 
task  relative  to  task  / moves  a task  into  the  summation  term  in  equation  (4-12). 
Since  Cj(t)  is  greater  than  or  equal  to  t j,  making  this  change  can  only  increase 

the  right  hand  side  of  (4-12). 


4.4.3:  Non-Distinct  Priorities 

For  various  reasons  it  may  be  desirable  to  assign  several  tasks  identical 
priorities.  For  example,  the  computer  hardware  may  only  support  a limited  number 
of  interrupt  priorities.  Since  the  control  structure  is  free  to  execute  any  of  the 
requested  tasks  having  the  highest  priority,  all  tasks  having  the  same  priority  as 
task  / must  be  treated  as  If  they  had  higher  priorities  when  checking  the  latency 
specifications.  This  assumes  that  the  control  structure  only  executes  task  / when 
all  other  requested  tasks  have  priorities  strictly  worse  than  p^. 

However,  this  also  makes  the  often  unrealistic  assumption  that  a task  can  be 
preempted  by  a task  with  equal  priority.  If  this  is  not  the  case  it  Is  necessary  to 
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simulate  the  control  structure  on  the  worst  case  sequence  of  requests.  It  is  not 
sufficient  to  treat  these  tasks  as  If  they  had  lower  priority  but  are  not  preemptible 
since  a pair  of  tasks  can  make  a sequence  of  requests  so  that  one  of  them 
requests  service  again  while  the  other  Is  being  executed.  Therefore,  the  first  task 
can  be  executed  twice  while  task  / is  waiting  for  service  although  task  / is  never 
preempted. 

4.5:  Applications  to  the  Control  Structure  Problem 

Verifying  the  real-time  performance  of  a static  priority  scheduler  on  more 
complex  task  structures  is  a straightforward  extension  of  the  verification  for 
Independent  tasks.  A latency  specification  ly  Is  satisfied  if  and  only  if  all  blocks  in 

the  constraint  path  can  always  be  executed  during  any  Interval  of  duration  ly.  It 

becomes  slightly  more  complex  to  compute  the  amount  of  CPU  time  used  by  higher 
priority  tasks  since  some  tasks  (blocks)  will  not  be  runnable  when  other  tasks  are 
requested. 

4.6.1:  Chains  of  Independent  Tasks 

If  no  block  appears  In  more  than  one  constraint  path,  the  constraint  paths  can 
be  treated  as  Independent  tasks.  A task  will  never  be  Interrupted  by  a request  of 
a predecessor  if  the  real-time  specifications  are  met  since  the  period  between 
requests  Is  not  less  than  the  deadline  for  any  one  request. 

The  priority  assignment  problem  would  be  very  much  more  difficult  if  it  were 

necessary  to  consider  assigning  different  priorities  to  individual  blocks  in  a chain. 
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However,  It  does  not  make  sense  to  assign  lower  priorities  to  some  blocks  in  the 
constraint  path,  since  it  makes  no  difference  where  in  the  chain  higher  priority 
tasks  are  allowed  to  interrupt.  Therefore,  all  the  tasks  in  the  chain  can  be 
assigned  the  same  priority  as  the  task  in  the  chain  with  the  least  priority. 

In  the  presence  of  overhead  it  is  more  efficient  to  create  one  'super-task'  that 
executes  all  the  blocks  consecutively  rather  than  incurring  the  overhead  of  a 
request  for  each  block  In  the  chain.  However,  if  the  control  structure  is  non- 

‘ 

preemptive  it  may  be  necessary  to  create  several  smaller  'super-tasks’  to  reduce 
j the  amount  of  time  that  must  be  spent  waiting  for  low  priority  tasks  to  complete. 

Deciding  how  many  tasks  to  create  and  how  large  to  make  them  could  be  made  on 
the  basis  of  how  much  CPU  time  needs  to  be  freed  up  in  order  to  find  a task  to 
assign  the  currently  worst  priority. 

I 

4.6.2:  More  Complex  Task  Relations 

There  are  fundamentally  two  ways  different  constraint  paths  can  have  a common 
block:  the  common  block  can  have  more  than  one  successor  or  it  can  have  more 
than  one  predecessor.  We  will  first  consider  the  simplest  example  of  each  type  of 
Interdependent  constraints. 

I I 

Consider  a block  diagram  in  which  block  4 has  successors  6 and  C.  The 
constraint  paths  for  this  diagram  are  AB  and  4C.  Since  a request  for  A will  always 
cause  requests  for  both  B and  C,  B^fi  - B^c . Therefore,  neither  B nor  C will  be 

Interrupted  by  requests  for  A as  long  as  the  real-time  specifications  are  met. 

Now,  If  Pg  > Pq  then  the  sequence  of  blocks  executed  whenever  A Is  requested 
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I 


k. 


is  ABC.  Otherwise  the  sequence  ACB  will  be  executed.  We  can  therefore  replace 
tne  tasks  4,  B,  and  C by  a task  that  executes  either  ABC  or  ACB.  The  latency 
specification  for  the  new  task  should  be  chosen  so  that  It  will  be  satisfied  If  and 
only  If  the  original  latency  specifications  are  satisfied.  These  latency 
specifications  are  satisfied  If  and  only  If: 

’dfll  * ,ost  to  lnt9rruPts)  (4*13) 

and 

Uc  * ((/me  lost  to  Interrupts)  (4-14) 

The  CPU  time  used  by  interrupting  tasks  will  be  Identical  for  both  the  ASC  and  ACB 
sequence,  except  If  ABC  Is  executed,  then  B must  be  considered  an  Interrupting 
task  In  equation  (4-14),  and  similarly  for  C and  equation  (4-13).  Therefore: 

*AflC  * mln^AB'  'aC-^  (4-16) 

and 

*4CB  * m/n^AC . ^ 1 

and  we  should  choose  the  sequence  that  yields  the  greater  latency. 

Now  consider  a block  diagram  In  which  C has  two  predecessors  A and  B.  The 
constraint  paths  for  this  block  diagram  are  AC  and  BC.  It  Is  also  quite  possible  to 
receive  a request  for  C while  C Is  already  requested  or  suspended.  However,  If  C 
was  first  requested  by  A,  the  additional  request  will  always  be  from  B and  vice 
versa.  If  this  occurs  the  logical  thing  to  do  Is  to  have  C executed  only  once,  but 
In  general  the  sequence  AC  will  be  executed  whenever  A Is  requested  and  BC  will 
be  requested  whenever  B Is  requested. 


It  Is  sufficient  to  replace  A,  B,  and  C by  two  tasks  which  executed  AC  and  SC 
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respectively,  Ignoring  the  possibility  that  at  times  C may  not  need  to  be  executed 
by  one  of  the  tasks.  However,  If  no  assignment  of  priorities  Is  found  treating  these 
tasks  as  independent,  It  is  not  necessarily  true  that  qq  such  assignment  would 
exist  if  the  common  block  C were  handled  more  carefully.  The  difficulty  is  that  the 
worst  case  sequence  of  requests  becomes  harder  to  construct. 

4.6.3:  Combining  Static  and  Dynamic  Control  Structures 

Rather  than  having  the  processor  idle  when  no  tasks  are  requested,  it  may  be 
possible  to  have  the  processor  executing  a static  control  structure  for  some 
portion  of  the  block  diagram.  In  this  case  we  would  consider  the  static  control 
structure  to  be  the  lowest  priority  task.  There  are  no  real-time  specifications  on 
this  task  in  the  usual  sense,  although  we  must  still  guarantee  the  latencies  in  the 
static  control  structure.  This  can  be  done  by  modifying  the  latency  specifications 
so  that  even  when  the  maximum  amount  of  CPU  time  is  used  by  the  dynamic  tasks, 
the  static  control  structure  still  runs  often  enough. 

Consider  a latency  specification  ly  for  Cy.  The  blocks  in  Cy  must  be  executed 

once  in  every  Interval  of  duration  ly.  The  trace  of  the  processor  is  no  longer 

completely  determined  by  the  static  control  structure  since  the  dynamically 
scheduled  tasks  will  Interrupt  the  static  control  structure.  However,  the  amount  of 
CPU  time  used  by  these  tasks  Is  known.  Therefore,  we  need  only  choose  new 
latency  specifications  for  the  statically  executed  constraints  according  to  the 
following  equation: 
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k 

V"1/-  21cy(*/)  (4"17) 

Where  constraints  1 through  k are  executed  by  the  static  priority  Interrupt  control 


structure. 
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6s  Multiple  Processor  Control  Structures 

So  far  we  have  only  considered  control  structures  using  a single  processor.  This 
chapter  discusses  some  of  the  issues  involved  in  making  use  of  more  than  one 
processor. 

The  first  question  to  consider  is  how  does  using  more  than  one  processor 
Improve  the  real-time  performance  of  a block  diagram  schema?  For  static  control 
structures,  implementing  constraint  paths  as  control  structures  on  separate 

processors  improves  both  the  latency  performance  and  the  bandwidth  performance 
by  decreasing  the  weights  of  the  critical  windows  and  decreasing  the  weight  of 
the  cycle.  At  the  limit  where  each  constraint  path  Is  implemented  on  its  own 

processor,  the  latency  performance  for  constraint  / = C#  is  2 1 Cf.  | and  the 

bandwidth  performance  is  1 / 1 C.  \ . 

Similarly  for  dynamic  control  structures,  if  each  constraint  path  were  implemented 
on  separate  processors  than  each  could  run  at  the  highest  priority.  The  latency 
performance  would  be  | | and  the  bandwidth  performance  would  be  1 / 1 C/ 1 . 

However,  these  figures  are  not  the  best  achievable.  Each  processor  could 
execute  only  a single  block,  but  then  data  must  be  transferred  between 

processors.  The  Interprocessor  communication  time  may  or  may  not  be  negligible 
depending  on  how  the  processors  are  Interconnected.  If  data  Is  transferred  using 
an  asynchronous  serial  transmission  protocol,  then  at  9600  baud  it  would  take 
about  one  millisecond  to  transfer  one  byte  between  processors.  Data  values  are 
likely  to  take  from  one  to  four  bytes,  and  a few  milliseconds  Is  a comparatively 
large  time,  even  on  relatively  slow  microcomputers.  On  the  other  hand,  If  the  data 
Is  transmitted  eight  bits  In  parallel,  the  communication  time  may  be  negligible. 

Even  If  the  communication  cost  Is  negligible,  executing  a single  block  on  each 

processor  does  not  Improve  the  latency  performance  when  each  processor  is 
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running  a static  control  structure.  Consider  the  example  shown  in  figure  5-1. 


A simple  multi-processor  control  structure 
Figure  6-1 

Assume  processor  one  is  executing  A*  and  processor  two  is  executing  fi*.  The 
latency  from  a to  b is  2t^.  If  the  processors  were  synchronized  so  that  processor 

two  started  executing  B as  soon  as  processor  one  finished  executing  A then  the 
latency  from  a to  c would  be  2t4+tg.  The  processors  are  not  synchronized,  but 

the  phase  difference  is  cannot  be  more  than  the  period  of  either  cycle.  Therefore, 
lfl>c  is  2t^  + tg  + mln(tA , tfi ).  But  then  if  tg  is  less  than  t^,  lac  is  2t^+2tfil 

exactly  the  same  as  if  processor  one  were  executing  (AS)*.  What,  if  anything, 
has  been  gained  by  using  two  processors?  The  latency  performance  has  not  been 
improved,  but  the  bandwidth  performance  has  been  Improved  to 
mln(  1/t4,  1/tg)-  1/t4. 

For  the  constraint  C.  = c,  • • • c.  the  latency  for  the  entire  path  is: 

' ' 1 'k 

k A-1 

* lc/  I + I®/  I + 1 m/o( | c,  |,  |c,  |)  (6-1) 

7-1  J i 7-1  'j  *M 

The  first  summation  In  (5-1)  is  the  basic  CPU  time  needed  to  execute  the  path. 
The  rest  of  (5-1)  is  the  phase  delay  between  processors. 

If  the  interprocessor  communication  times  were  not  negligible,  they  would 
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Increase  the  phase  delay  between  processors.  Let  IPCT . . , be 

JtJ  + > 


the 


Interprocessor  communication  time  between  the  processor  where  c,  is  executed 


and  the  processor  where  c 


, is  executed,  equation  (5-1)  becomes: 

*7+1 


A-1 


k-1 


* JCy  l + |c;  1+  2 m//»( | Cj  1,1c,  |)+  2 (IPCT . . J (5-2) 

7-1  7 1 7-1  J J-1  J-1  J,J  1 

If  processor  two  could  be  synchronized  with  processor  one  then  the  latency 
performance  could  be  improved  even  more.  Notice  that  for  processor  two,  link  b is 
an  input  link.  However,  the  input  signal  at  link  b does  not  change  continuously. 
Therefore,  processor  two  should  synchronize  with  processor  one  by  executing  a 
dynamic  control  structure.  If  processor  two  can  keep  up  with  processor  one  (l.e.  If 
tg  st^,)  then  the  latency  performance  would  improve  to  2t^+tg. 

If  processor  two  cannot  keep  up  with  processor  one,  the  latency  performance 
depends  on  whether  or  not  processor  one  synchronizes  with  processor  two.  We 
will  assume  that  requests  for  B occurring  when  B Is  executing  are  remembered  and 

therefore  processor  two  is  executing  B*.  If  processor  one  does  not  wait  for 
processor  one,  the  processors  are  essentially  be  executing  static  control 
structures. 

If  processor  one  synchronizes  with  processor  two  by  idling  until  processor  two  is 

ready  to  accept  the  next  request,  processor  one  still  executes  A*,  but  4 is 
executed  once  every  tfl  units  instead  of  once  every  units.  The  latency 

performance  becomes  2tfl+t g.  For  the  constraint  Cj  - • • ■ cf  It  Is  necessary 

to  synchronize  all  h processors  so  each  processor  will  idle  until  the  next  processor 
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is  ready  to  accept  a new  request.  Then  the  latency  performance  becomes: 

k k 

2max(,  | c,  |)+  2 |c,  | (6-3) 

7-1  'j  J-2  'j 

The  first  term  In  (6-3)  Is  the  latency  performance  of  the  static  control  structure 
running  on  the  first  processor  when  synchronized  to  the  slowest  block  in  the 
constraint  path.  The  bandwidth  performance  of  the  multi-processor  control 
structure  is.* 

— — (6-4) 

k 

majr(|c,  |) 

7-1  'j 

If  the  Interprocessor  communications  costs  are  not  negligible,  equation  (6-3) 
becomes: 

k k k-1 

2max(|c.  |)  + 2 |c,  | ♦ I (IPCT,,..,)  (6-6) 

y-i  'j  J-2  J J-1  J,J 

Notice  that  in  general,  implementing  each  constraint  path  on  a separate 
processor  will  Improve  both  the  bandwidth  and  latency  performance  of  the  control 
structure.  Splitting  a constraint  path  across  several  processors  may  not  improve 
the  latency  performance,  especially  if  the  communication  costs  are  not  negligible. 
However,  this  will  Improve  the  bandwidth  performance. 

6. 1 : Assigning  Control  Structures  to  Multiple  Processors 

If  the  real-time  specifications  do  not  exceed  the  bounds  Implied  by  the  equations 
derived  in  the  previous  section,  then  the  specifications  can  be  met  by  a control 
structure  which  assigns  one  block  per  processor.  Although  one  can  argue  that 


computers  are  cheap,  and  getting  cheaper  all  the  time,  they  are  not  free. 
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Therefore,  we  are  generally  Interested  In  finding  a control  structure  that  satisfies 
the  real-time  specifications  that  uses  a minimal  number  of  processors. 

Unfortunately,  this  problem  Is  computationally  intractable.  It  has  been  shown 
that  the  problem  of  minimizing  the  number  of  processors  for  a dynamic  control 
structure  problem  is  NP-complete  for  the  special  case  of  independent  tasks  and 
deadlines  coinciding  with  the  next  request  for  each  task  [4].  Also,  Al  Mok  has 
discovered  that  the  problem  of  minimizing  the  number  of  processors  needed  to 
Implement  a static  control  structure  is  also  NP-complete  [17]. 

On  the  other  hand,  experience  with  similar  problems  has  shown  that  reasonable 
heuristics  may  exist.  Dhall's  work  shows  that  statically  assigning  independent 
tasks  to  processors  running  an  earliest  deadline  scheduler  Is  directly  equivalent  to 
the  bin-packing  problem.  Although  this  Is  an  NP-complete  problem,  several  heuristics 
are  known  that  are  sup-optimal  by  a bounded  ratio.  These  algorithms  are  directly 
applicable  to  the  scheduling  problem.  Assigning  tasks  to  processors  running  static 
priority  schedulers  is  not  equivalent  to  the  bin-packing  problem,  but  Dhall  has 
established  similar  bounds  for  simple  first-fit  and  next-fit  algorithms. 

Therefore,  it  is  reasonable  to  expect  that  similar  bounds  could  be  derived  for 
algorithm  that  assigned  constraints  to  processors  using  a first-fit  or  next-fit 
strategy.  Some  other  factors  should  affect  the  assignment  of  tasks  to  processors. 
If  the  block  diagram  can  be  partitioned  into  disjoint  subdiagrams,  and  the 
subdiagrams  assigned  to  processors  as  a unit  then  no  Interprocessor  communication 
Is  needed.  However,  the  bandwidth  or  latency  specifications  on  a constraint  path 
may  require  that  the  blocks  of  the  constraint  path  be  split  among  several 
processors. 
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6.2:  Dynamic  Assignment  of  Processors 

It  doesn't  make  sense  to  assign  processors  dynamically  If  each  processor  Is 
running  a static  control  structure.  However,  If  the  processors  are  running  dynamic 
control  structures  (i.e.  if  we  have  requests  for  tasks),  then  a control  structure 
might  do  better  by  not  assigning  a processor  to  a task  until  it  requests  service. 
Unfortunately,  there  are  no  known  algorithms  for  scheduling  more  than  one 
processor  in  an  optimal  manner  in  the  sense  that  the  earliest  deadline  scheduler  is 
optimal  for  a single  processor.  In  fact,  Mok  has  shown  that  such  an  algorithm  must 
have  knowledge  about  future  requests.  Unfortunately,  limitations  on  the  set  of 
tasks  that  dynamic  schedulers  can  guarantee  to  schedule  to  meet  their  deadlines 


are  comparable  to  the  restrictions  Imposed  by  statically  assigning  tasks  to 

,1 


processors 


•:  Summary  and  Conclusions 

We  have  presented  a model  for  real-time  computations  that  provides  precise 


definitions  of  real-time  performance.  The  model  has  the  additional  advantage  of 
strongly  corresponding  to  intuition.  This  makes  the  model  ideal  for  defining  the 
semantics  of  a reahtime  programming  language.  Tne  model  also  avoids  close 


association  with  any  Implementation.  Therefore,  the  model  is  applicable  to  a wide 
variety  of  systems.  Conversely,  a language  based  on  this  model  should  be  easily 
Implementable  In  a wide  variety  of  ways,  without  encountering  features  of  the 
model  too  finely  tuned  to  a particular  implementation. 

Several  strategies  for  implementing  control  structures  for  block  diagram  systems 
were  investigated.  The  first  strategy  was  to  find  a static  execution  order  for  the 
blocks  in  the  diagram.  Control  structures  of  this  type  have  been  somewhat  ignored 
for  time  critical  applications.  An  important  result  is  that  any  such  control  structure 
could  be  represented  as  a finite  cycle,  although  the  bounds  on  the  length  of  the 
cycle  are  so  large  that  explicit  enumeration  is  impractical  as  a synthesis  technique. 
A branch-and-bound  synthesis  method  was  developed,  but  unfortunately  it  is  also 
Impractical  for  large  problems.  We  suspect  that  the  synthesis  problem  is  NP- 
complete  (computationally  intractable),  but  have  not  proved  this  conjecture.  In  any 
case,  we  believe  it  Is  more  promising  to  Investigate  fast  heuristic  algorithms  for 
synthesizing  static  control  structures. 

The  next  general  strategy  Investigated  made  use  of  the  fact  that  in  many 
applications  the  input  values  change  at  discrete  times.  Under  this  assumption, 
block  diagram  schemata  are  closer  to  traditional  models  of  real-time  computations. 
Previous  research  has  found  optimal  schedulers  for  the  special  case  of  one 
processor  and  independent  tasks.  However,  simpler  static  priority  schedulers  had 
been  Ignored  except  for  the  special  case  of  the  latency  specifications  being 
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Identical  to  the  bandwidth  period.  We  developed  an  efficient  algorithm  for  assigning 
priorities  to  independent  tasks  when  the  latency  specification  is  less  than  the 
bandwidth  period.  The  synthesis  techniques  were  modified  to  construct  control 
structures  for  block  diagram  schemata  in  which  the  blocks  were  not  independent. 

Since  the  analysis  of  the  real-time  performance  of  block  diagram  schemata  under 
a static  priority  control  structure  is  similar  to  the  analysis  of  static  priority 
queueing  systems,  the  priority  assignment  algorithm  can  also  be  applied  to  priority 
queueing  systems. 

Finally,  we  discussed  some  of  the  issues  that  arise  when  more  than  one 
processor  Is  available  to  the  control  structure.  The  real-time  performance  of 
multiprocessor  control  structures  was  analyzed,  and  absolute  bounds  on  the  real- 
time performance  for  a block  diagram  schema  were  derived,  if  the  real-time 
specifications  can  be  met  by  a multiprocessor  control  structure,  the  objective 
becomes  minimizing  the  number  of  processors  needed  to  Implement  a feasible 
control  structure.  Several  special  cases  are  known  to  be  NP-complete,  so  the 
general  problem  Is  also  NP-complete.  However,  there  Is  reason  to  believe  that 
simple  algorithms  will  produce  control  structures  using  a number  of  processors  that 
dilfers  from  the  minimal  number  by  a bounded  factor,  although  no  specific  algorithms 
were  investigated. 

Future  work  should  probably  concentrate  on  either  proving  various  synthesis 
problems  to  be  NP-complete  or  finding  efficient  algorithms.  In  the  event  the 
problems  gig  Intractable,  the  performance  of  efficient  heuristic  algorithms  should  be 
studied.  Certainly  any  implementation  of  a practical  language  system  based  on 
block  diagram  schemata  should  attempt  to  find  and  improve  such  heuristic  methods. 
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A practical  system  should  also  attempt  make  use  of  more  of  the  special  cases  for 


which  efficient  algorithms  are  known 


. 
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